RE: Load balancing with solr cloud

2016-10-21 Thread Garth Grimm
I just realized that I made an assumption about your initial question that may 
not be true.

Everything I've said has been based on handling requests to add/update 
documents during the indexing process.  That process involves the "leader 
first" concept I've been mentioning.

So to answer your original question on the query side

> Actually, zookeeper really won't participate in the query process at all.  
> And the leader role for a core in a shard has no bearing whatsoever.
>
> ;-) Read ymonad's answer. ;-)  The CloudSolrServer class has been renamed to 
> CloudSolrClient (or something similar) recently, but otherwise, I think his 
> answer is still basically correct.

It's worth noting that even if the node that receives the request has a core 
that could participate in generating results, it might ask some other core of 
that same shard to return the results for that shard.  The preferLocalShards 
parameter can be used to avoid that (near the bottom of 
https://cwiki.apache.org/confluence/display/solr/Distributed+Requests).  

In any case, if you have many shards, load balancing on the query side is 
definitely more important than on the indexing side.  The query controller will 
have to merge the result sets (one from each shard), and initiate the second 
pass of requests to get stored fields, and then marshall all that data back 
through the HTTP response.  That's more extra work then the controller has to 
do for an update request, which is basically just pass along whatever 
information the shard leader responded with.

And load balancing for reliability purposes is always a good thing.

>>> Also, for indexing, I think it's possible to control how many replicas need 
>>> to confirm to the leader before the response is supplied to the client, as 
>>> you can with say MongoDB replicas.

Yes, that's possible.  It's what I was thinking about when I mentioned 
"...general case flow".  That capability is relatively new, and not the 
default, which is why I didn't mention it.

-Original Message-
From: hairymccla...@yahoo.com.INVALID [mailto:hairymccla...@yahoo.com.INVALID] 
Sent: Friday, October 21, 2016 4:07 AM
To: solr-user@lucene.apache.org
Subject: Re: Load balancing with solr cloud

As I understand it for non-SolrCloud aware clients you have to manually load 
balance your searches, see ymonad's answer here:
http://stackoverflow.com/questions/22523588/loadbalancer-and-solrcloud

This is from 2014 so maybe this has changed now - would be interested to know 
as well.
Also, for indexing, I think it's possible to control how many replicas need to 
confirm to the leader before the response is supplied to the client, as you can 
with say MongoDB replicas.

 

On Friday, October 21, 2016 1:18 AM, Garth Grimm 
<garthgr...@averyranchconsulting.com> wrote:
 

 No matter where you send the update to initially, it will get sent to the 
leader of the shard first.  The leader does a parsing of it to ensure it can be 
indexed, then it will send it to all the replicas in parallel.  The replicas 
will do their parsing and report back that they have persisted the data to 
their tlogs.  Once the leader hears back from all the replicas, the leader will 
reply back that the update is complete, and your client will receive it's HTTP 
response on the transaction.

At least that's the general case flow.

So it really won't matter how your load balancing is handled above the cloud.  
All the work is done the same way, with the leader having to do slightly more 
work than the replicas.

If you can manage to initially send all the updates to the correct leader, you 
can skip one hop before the work starts, which may buy you a small performance 
boost compared to randomly picking a node to send the request to.  But you'll 
need to be taxing the cloud pretty heavily before that difference becomes too 
noticeable.

-Original Message-
From: Sadheera Vithanage [mailto:sadhee...@gmail.com]
Sent: Thursday, October 20, 2016 5:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Load balancing with solr cloud

Thank you very much John and Garth,

I've tested it out and it works fine, I can send the updates to any of the solr 
nodes.

If I am not using a zookeeper aware client and If I direct all my queries (read 
queries) always to the leader of the solr instances,does it automatically load 
balance between the replicas?

Or do I have to hit each instance in a round robin way and have the load 
balanced through the code?

Please advise the best way to do so..

Thank you very much again..



On Fri, Oct 21, 2016 at 9:18 AM, Garth Grimm < 
garthgr...@averyranchconsulting.com> wrote:

> Actually, zookeeper really won't participate in the update process at all.
>
> If you're using a "zookeeper aware" client like SolrJ, the SolrJ 
> library will read the cloud configuration from zookeeper, but will 
> send all 

RE: Load balancing with solr cloud

2016-10-20 Thread Garth Grimm
No matter where you send the update to initially, it will get sent to the 
leader of the shard first.  The leader does a parsing of it to ensure it can be 
indexed, then it will send it to all the replicas in parallel.  The replicas 
will do their parsing and report back that they have persisted the data to 
their tlogs.  Once the leader hears back from all the replicas, the leader will 
reply back that the update is complete, and your client will receive it's HTTP 
response on the transaction.

At least that's the general case flow.

So it really won't matter how your load balancing is handled above the cloud.  
All the work is done the same way, with the leader having to do slightly more 
work than the replicas.

If you can manage to initially send all the updates to the correct leader, you 
can skip one hop before the work starts, which may buy you a small performance 
boost compared to randomly picking a node to send the request to.  But you'll 
need to be taxing the cloud pretty heavily before that difference becomes too 
noticeable.

-Original Message-
From: Sadheera Vithanage [mailto:sadhee...@gmail.com] 
Sent: Thursday, October 20, 2016 5:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Load balancing with solr cloud

Thank you very much John and Garth,

I've tested it out and it works fine, I can send the updates to any of the solr 
nodes.

If I am not using a zookeeper aware client and If I direct all my queries (read 
queries) always to the leader of the solr instances,does it automatically load 
balance between the replicas?

Or do I have to hit each instance in a round robin way and have the load 
balanced through the code?

Please advise the best way to do so..

Thank you very much again..



On Fri, Oct 21, 2016 at 9:18 AM, Garth Grimm < 
garthgr...@averyranchconsulting.com> wrote:

> Actually, zookeeper really won't participate in the update process at all.
>
> If you're using a "zookeeper aware" client like SolrJ, the SolrJ 
> library will read the cloud configuration from zookeeper, but will 
> send all the updates to the leader of the shard that the document is meant to 
> go to.
>
> If you're not using a "zookeeper aware" client, you can send the 
> update to any of the solr nodes, and they will evaluate the cloud 
> configuration information they've already received from zookeeper, and 
> then forward the document to leader of the shard that will handle the 
> document update.
>
> In general, Zookeeper really only provides the cloud configuration 
> information once (at most) during all the updates, the actual document 
> update only gets sent to solr nodes.  There's definitely no need to 
> distribute load between zookeepers for this situation.
>
> Regards,
> Garth Grimm
>
> -Original Message-
> From: Sadheera Vithanage [mailto:sadhee...@gmail.com]
> Sent: Thursday, October 20, 2016 5:11 PM
> To: solr-user@lucene.apache.org
> Subject: Load balancing with solr cloud
>
> Hi again Experts,
>
> I have a question related to load balancing in solr cloud.
>
> If we have 3 zookeeper nodes and 3 solr instances (1 leader, 2 
> secondary replicas and 1 shard), when the traffic comes in the primary 
> zookeeper server will be hammered, correct?
>
> I understand (or is it wrong) that zookeeper will load balance between 
> solr nodes but if we want to distribute the load between zookeeper 
> nodes as well, what is the best approach.
>
> Cost is a concern for us too.
>
> Thank you very much, in advance.
>
> --
> Regards
>
> Sadheera Vithanage
>



--
Regards

Sadheera Vithanage


RE: Load balancing with solr cloud

2016-10-20 Thread Garth Grimm
Actually, zookeeper really won't participate in the update process at all.

If you're using a "zookeeper aware" client like SolrJ, the SolrJ library will 
read the cloud configuration from zookeeper, but will send all the updates to 
the leader of the shard that the document is meant to go to.

If you're not using a "zookeeper aware" client, you can send the update to any 
of the solr nodes, and they will evaluate the cloud configuration information 
they've already received from zookeeper, and then forward the document to 
leader of the shard that will handle the document update.

In general, Zookeeper really only provides the cloud configuration information 
once (at most) during all the updates, the actual document update only gets 
sent to solr nodes.  There's definitely no need to distribute load between 
zookeepers for this situation.

Regards,
Garth Grimm

-Original Message-
From: Sadheera Vithanage [mailto:sadhee...@gmail.com] 
Sent: Thursday, October 20, 2016 5:11 PM
To: solr-user@lucene.apache.org
Subject: Load balancing with solr cloud

Hi again Experts,

I have a question related to load balancing in solr cloud.

If we have 3 zookeeper nodes and 3 solr instances (1 leader, 2 secondary 
replicas and 1 shard), when the traffic comes in the primary zookeeper server 
will be hammered, correct?

I understand (or is it wrong) that zookeeper will load balance between solr 
nodes but if we want to distribute the load between zookeeper nodes as well, 
what is the best approach.

Cost is a concern for us too.

Thank you very much, in advance.

--
Regards

Sadheera Vithanage


RE: FAST to SOLR migration

2016-09-23 Thread Garth Grimm
Have you evaluated whether the "mm" parameter might help?

https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser#TheDisMaxQueryParser-Themm(MinimumShouldMatch)Parameter

-Original Message-
From: preeti kumari [mailto:preeti.bg...@gmail.com] 
Sent: Friday, September 23, 2016 5:32 AM
To: solr-user@lucene.apache.org
Subject: FAST to SOLR migration

Hi All,

I am trying to migrate FAST esp to SOLR search engine.

I am trying to implement mode="ONEAR" from FAST in solr.

Please let me know if anyone has any idea about this.

ngram:string("750 500 000 000 000 000",mode="ONEAR")

In solr we are splitting to split field in "750 500 000 000 000 000" but it 
gives me matches even if one of the term matches eg: match with ngram as 750. 
This results in lots of irrelevant matches. I need matches where atleast 3 
terms from ngram matches.

Thanks
Preeti


RE: Clarity on Sharding Concepts.

2016-05-31 Thread Garth Grimm
Both.

One shard will have roughly half the documents, and the indices built from 
them; the other shard will have the other half of the documents, and the 
indices built from those.

There won't be one location that contains all the documents, nor all the 
indices.

-Original Message-
From: Siddhartha Singh Sandhu [mailto:sandhus...@gmail.com] 
Sent: Tuesday, May 31, 2016 10:43 AM
To: solr-user@lucene.apache.org; muge...@gmail.com
Subject: Re: Clarity on Sharding Concepts.

Hi Mugeesh,

I was speculating whether sharding is done on:
1. index terms with each shard having the whole document space.
2. document space with each shard have num(documents/no. of shards) of the 
documents divided between them.

Regards,

Sid.

On Tue, May 31, 2016 at 12:19 AM, Mugeesh Husain  wrote:

> Hi,
>
> To read out this document
>
> https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+D
> ata+in+SolrCloud
> for proper understanding.
>
> FYI, you are using implicit router, a document will be divided 
> randomly based on hashing technique.
>
> If you indexed 50 documents, it will be divided into 2 parts, 1 goes 
> to shard1, second one is shard2 and same document will be go their 
> replica respectively .
>
>
> Thanks
> Mugeesh
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Clarity-on-Sharding-Concepts-tp4279
> 842p4279856.html Sent from the Solr - User mailing list archive at 
> Nabble.com.
>


RE: number of zookeeper & aws instances

2016-04-13 Thread Garth Grimm
I thought that if you start with 3 Zk nodes in the ensemble, and only lose 1, 
it will have no effect on indexing at all, since you still have a quorum.

If you lose 2 (which takes you below quorum), then the cloud loses "confidence" 
in which solr core is the leader of each shard and stops indexing.  But queries 
will continue since no zk managed information is needed for that.

Please correct me if I'm wrong, on any of that.

-Original Message-
From: Daniel Collins [mailto:danwcoll...@gmail.com] 
Sent: Wednesday, April 13, 2016 10:34 AM
To: solr-user@lucene.apache.org
Subject: Re: number of zookeeper & aws instances

Just to chip in, more ZKs are probably only necessary if you are doing NRT 
indexing.

Loss of a single ZK (in a 3 machine setup) will block indexing for the time it 
takes to get that machine/instance back up, however it will have less impact on 
search, since the search side can use the existing state of the cloud to work.  
If you only index once a day, then that's fine, but in our scenario, we 
continually index all day long, so we can't afford a "break".
Hence we actually run 7 ZKs currently though we plan to go down to 5.  That 
gives us the ability to lose 2 machines without affecting indexing.

But as Erick says, for "normal" scenarios, where search load is much greater 
than indexing load, 3 should be sufficient.


On 13 April 2016 at 15:27, Erick Erickson  wrote:

> bq: or is it dependent on query load and performance sla's
>
> Exactly. The critical bit is that every single replica meets your SLA.
> By that I mean let's claim that your SLA is 500ms. If you can serve 10 
> qps at that SLA with one replica/shard (i.e. leader only) you can 
> server 50 QPS by adding 4 more replicas.
>
> What you _cannot_ do is reduce the 500ms response time by adding more 
> replicas. You'll need to add more shards, which probably means 
> re-indexing. Which is why I recommend pushing a test system to 
> destruction before deciding on the final numbers.
>
> And having at least 2 replicas shard (leader and replica) is usually a 
> very good thing because Solr will stop serving queries or indexing if 
> all the replicas for any shard are down.
>
> Best,
> Erick
>
> On Wed, Apr 13, 2016 at 7:19 AM, Jay Potharaju 
> wrote:
> > Thanks for the feedback Eric.
> > I am assuming the number of replicas help in load balancing and
> reliability. That being said are there any recommendation for that, or 
> is it dependent on query load and performance sla's.
> >
> > Any suggestions on aws setup?
> > Thanks
> >
> >
> >> On Apr 13, 2016, at 7:12 AM, Erick Erickson 
> >> 
> wrote:
> >>
> >> For collections with this few nodes, 3 zookeepers are plenty. From 
> >> what I've seen people don't go to 5 zookeepers until they have 
> >> hundreds and hundreds of nodes.
> >>
> >> 100M docs can fit on 2 shards, I've actually seen many more. That 
> >> said, if the docs are very large and/or the searchers are complex 
> >> performance may not be what you need. Here's a long blog on testing 
> >> a configuration to destruction to be _sure_ you can scale as you 
> >> need:
> >>
> >>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract
> -why-we-dont-have-a-definitive-answer/
> >>
> >> Best,
> >> Erick
> >>
> >>> On Wed, Apr 13, 2016 at 6:47 AM, Jay Potharaju 
> >>> 
> wrote:
> >>> Hi,
> >>>
> >>> In my current setup I have about 30 million docs which will grow 
> >>> to 100 million by the end of the year. In order to accommodate 
> >>> scaling and
> query
> >>> load, i am planning to have atleast 2 shards and 2/3 replicas to 
> >>> begin with. With the above solrcloud setup I plan to have 3 
> >>> zookeepers in the quorum.
> >>>
> >>> If the number of replicas and shards increases, the number of solr 
> >>> instances will also go up. With keeping that in mind I was 
> >>> wondering if there are any guidelines on the number of zk 
> >>> instances to solr
> instances.
> >>>
> >>> Secondly are there any recommendations for setting up solr in AWS?
> >>>
> >>> --
> >>> Thanks
> >>> Jay
>


RE: Indexing using a collection alias

2015-12-22 Thread Garth Grimm
Yes.

-Original Message-
From: Yago Riveiro [mailto:yago.rive...@gmail.com] 
Sent: Tuesday, December 22, 2015 5:51 AM
To: solr-user@lucene.apache.org
Subject: Indexing using a collection alias

Hi,

It's possible index documents using the alias and not the collection name, if 
the alias only point to one collection?

The Solr collection API doesn't allow rename a collection, so I wan't to know 
if with aliases I can achieve this functionality.

All documentation that I googled use the alias for read operations ...



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-using-a-collection-alias-tp4246521.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: optimize status

2015-06-29 Thread Garth Grimm
 Is there really a good reason to consolidate down to a single segment?

Archiving (as one example).  Come July 1, the collection for log
entries/transactions in June will never be changed, so optimizing is
actually a good thing to do.

Kind of getting away from OP's question on this, but I don't think the
ability to move data between shards in SolrCloud (such as shard splitting)
has much to do with the Lucene segments under the hood.  I'm just guessing,
but I'd think the main issue with shard splitting would be to ensure that
document route ranges are handled properly, and I don't think the value used
for routing has anything to do with what segment they happen to be stored
into.

-Original Message-
From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] 
Sent: Monday, June 29, 2015 11:38 AM
To: solr-user@lucene.apache.org
Subject: RE: optimize status

Is there really a good reason to consolidate down to a single segment?

Any incremental query performance benefit is tiny compared to the loss of
managability.   

I.e. shouldn't segments _always_ be kept small enough to facilitate
re-balancing data across shards?   Even in non-cloud instances this is true.
When a collection grows, you may want shard/split an existing index by
adding a node and moving some segments around.Isn't this the direction
Solr is going?   With many, smaller segments, this is feasible.  With one
big segment, the collection must always be reindexed.

Thus, optimize would mean, get rid of all deleted records and would, in
fact, optimize queries by eliminating wasted I/O.   Perhaps worth it for
slowly changing indexes.   Seems like the Tiered merge policy is 90% there
...Or am I all wet (again)?

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org]
Sent: Monday, June 29, 2015 10:39 AM
To: solr-user@lucene.apache.org
Subject: Re: optimize status

Optimize is a manual full merge.

Solr automatically merges segments as needed. This also expunges deleted
documents.

We really need to rename optimize to force merge. Is there a Jira for
that?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

On Jun 29, 2015, at 5:15 AM, Steven White swhite4...@gmail.com wrote:

 Hi Upayavira,
 
 This is news to me that we should not optimize and index.
 
 What about disk space saving, isn't optimization to reclaim disk space 
 or is Solr somehow does that?  Where can I read more about this?
 
 I'm on Solr 5.1.0 (may switch to 5.2.1)
 
 Thanks
 
 Steve
 
 On Mon, Jun 29, 2015 at 4:16 AM, Upayavira u...@odoko.co.uk wrote:
 
 I'm afraid I don't understand. You're saying that optimising is 
 causing performance issues?
 
 Simple solution: DO NOT OPTIMIZE!
 
 Optimisation is very badly named. What it does is squashes all 
 segments in your index into one segment, removing all deleted 
 documents. It is good to get rid of deletes - in that sense the index is
optimized.
 However, future merges become very expensive. The best way to handle 
 this topic is to leave it to Lucene/Solr to do it for you. Pretend 
 the optimize option never existed.
 
 This is, of course, assuming you are using something like Solr 3.5+.
 
 Upayavira
 
 On Mon, Jun 29, 2015, at 08:08 AM, Summer Shire wrote:
 
 Have to cause of performance issues.
 Just want to know if there is a way to tap into the status.
 
 On Jun 28, 2015, at 11:37 PM, Upayavira u...@odoko.co.uk wrote:
 
 Bigger question, why are you optimizing? Since 3.6 or so, it 
 generally hasn't been requires, even, is a bad thing.
 
 Upayavira
 
 On Sun, Jun 28, 2015, at 09:37 PM, Summer Shire wrote:
 Hi All,
 
 I have two indexers (Independent processes ) writing to a common 
 solr core.
 If One indexer process issued an optimize on the core I want the 
 second indexer to wait adding docs until the optimize has 
 finished.
 
 Are there ways I can do this programmatically?
 pinging the core when the optimize is happening is returning OK
 because
 technically
 solr allows you to update when an optimize is happening.
 
 any suggestions ?
 
 thanks,
 Summer
 


*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately
and then delete it.

TIAA-CREF
*




RE: Connecting to a Solr server remotely

2015-06-22 Thread Garth Grimm
Check the firewall settings on the Linux machine.

By default, mine block port 8983, so the request never even gets to Jetty/Solr.

-Original Message-
From: Paden [mailto:rumsey...@gmail.com] 
Sent: Monday, June 22, 2015 2:48 PM
To: solr-user@lucene.apache.org
Subject: Connecting to a Solr server remotely

Hello, 

I've set up a Solr server on my Linux Virtual Machine. Now I'm trying to access 
it remotely on my Windows Machine using an http request from a browser. 

Any time I try to access it with a request such as http//localhost:8983/solr 
I always get a connection error (with the server running on the linux virtual 
machine, it's not a because I forgot to turn the service on) I know that my 
server is probably set to take requests specifically from my virtual machine so 
I need to change that.  

From the several hours of research I've done on the web it seems like I need 
to change jetty.xml in the /etc/jetty directory. But others suggest I need to 
make a change to solr.config itself. There's a lot of conflicting info and 
it's pretty much got me randomly changing things in jetty.xml and solr.config 
and nothing's worked as of yet. 

If anybody has any idea how to get this to work I would greatly appreciate it. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Connecting-to-a-Solr-server-remotely-tp4213335.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr Logging

2015-06-19 Thread Garth Grimm
Framework way?

Maybe try delving into the log4j framework and modify the log4j.properties 
file.  You can generate different log files based upon what class generated the 
message.  Here's an example that I experimented with previously, it generates 
an update log, and 2 different query logs with slightly different information 
about each query.

Adding a component to each requestHandler dedicated to logging might be the 
best way, but that might not qualify as a framework way, and I've never tried 
anything like that, so don't know how easy it might be.

Just sending the relevant lines from log4j.properties, excluding the lines  
that are there by default.

# Logger for updates
log4j.logger.org.apache.solr.update.processor.LogUpdateProcessor=INFO, Updates

#- size rotation with log cleanup.
log4j.appender.Updates=org.apache.log4j.RollingFileAppender
log4j.appender.Updates.MaxFileSize=4MB
log4j.appender.Updates.MaxBackupIndex=9

#- File to log to and log format
log4j.appender.Updates.File=${solr.log}/solr_Updates.log
log4j.appender.Updates.layout=org.apache.log4j.PatternLayout
log4j.appender.Updates.layout.ConversionPattern=%-5p - %d{-MM-dd 
HH:mm:ss.SSS}; %C; %m\n

# Logger for queries, using SolrDispatchFilter
log4j.logger.org.apache.solr.servlet.SolrDispatchFilter=DEBUG, queryLog1

#- size rotation with log cleanup.
log4j.appender.queryLog1=org.apache.log4j.RollingFileAppender
log4j.appender.queryLog1.MaxFileSize=4MB
log4j.appender.queryLog1.MaxBackupIndex=9

#- File to log to and log format
log4j.appender.queryLog1.File=${solr.log}/solr_queryLog1.log
log4j.appender.queryLog1.layout=org.apache.log4j.PatternLayout
log4j.appender.queryLog1.layout.ConversionPattern=%-5p - %d{-MM-dd 
HH:mm:ss.SSS}; %C; %m\n

# Logger for queries, using SolrCore
log4j.logger.org.apache.solr.core.SolrCore=INFO, queryLog2

#- size rotation with log cleanup.
log4j.appender.queryLog2=org.apache.log4j.RollingFileAppender
log4j.appender.queryLog2.MaxFileSize=4MB
log4j.appender.queryLog2.MaxBackupIndex=9

#- File to log to and log format
log4j.appender.queryLog2.File=${solr.log}/solr_queryLog2.log
log4j.appender.queryLog2.layout=org.apache.log4j.PatternLayout
log4j.appender.queryLog2.layout.ConversionPattern=%-5p - %d{-MM-dd 
HH:mm:ss.SSS}; %C; %m\n


-Original Message-
From: rbkumar88 [mailto:rbkuma...@gmail.com] 
Sent: Thursday, June 18, 2015 10:41 AM
To: solr-user@lucene.apache.org
Subject: Solr Logging

Hi,

I want to log Solr search queries/response time and Solr indexing log 
separately in different set of log files.
Is there any convenient framework/way to do it.

Thanks
Bharath



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Logging-tp4212730.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Is copyField a must?

2015-05-14 Thread Garth Grimm
Yes, it does support POST.  As to format, I believe that's handled by the 
container.  So if you're url-encoding the parameter values, you'll probably 
need to set Content-Type: application/x-www-form-urlencoded for the HTTP POST 
header.

-Original Message-
From: Steven White [mailto:swhite4...@gmail.com] 
Sent: Thursday, May 14, 2015 3:12 PM
To: solr-user@lucene.apache.org
Subject: Re: Is copyField a must?

Anyone knows the answer to Shawn's question?  Does Solr support POST request 
and is the format the same as GET?

If it does than it means I don't have to create multiple request handlers.

Thanks

Steve

On Wed, May 13, 2015 at 6:02 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 5/13/2015 3:36 PM, Steven White wrote:
  Note, I want to avoid a URL base solution (sending the list of 
  fields
 over
  HTTP) because the list of fields could be large (1000+) and thus I 
  will exceed GET limit quickly (does Solr support POST for searching, 
  if so,
 than
  I can use URL base solution?)

 Solr does indeed support a query sent as the body in a POST request.
 I'm not completely positive, but I think you'd use the same format as 
 you put on the URL:

 q=foorows=1fq=bar

 If anyone knows for sure what should be in the POST body, please let 
 me and Steven know.  In particular, should the content be URL escaped, 
 as might be required for a GET?

 Thanks,
 Shawn





RE: Remote connection to Solr

2015-04-24 Thread Garth Grimm
Shawn's explanation fits better with why Websphere and Jetty might behave 
differently.  But something else that might be happening could be if the DHCP 
negotiation causes the IP address to change from one network to another and 
back.

-Original Message-
From: Steven White [mailto:swhite4...@gmail.com] 
Sent: Friday, April 24, 2015 9:23 AM
To: solr-user@lucene.apache.org
Subject: Re: Remote connection to Solr

Hi Shawn,

The firewall was the first thing I looked into and after fiddling with it, I 
still see the issue.  But if that was the issue, why WebSphere doesn't run into 
it but Jetty is?  However, your point about domain / non domain and private / 
public network maybe provide me with some new area to look into.

Thanks

Steve

On Fri, Apr 24, 2015 at 10:11 AM, Shawn Heisey apa...@elyograg.org wrote:

 On 4/24/2015 8:03 AM, Steven White wrote:
  This maybe a Jetty question but let me start here first.
 
  I have Solr running on my laptop and from my desktop I have no issue 
  accessing it.  However, if I take my laptop home and connect it to 
  my
 home
  network, the next day when I connect the laptop to my office 
  network, I
 no
  longer can access Solr from my desktop.  A restart of Solr will not 
  do,
 the
  only fix is to restart my Windows 8.1 OS (that's what's on my laptop).
 
  I have not been able to figure out why this is happening and I'm
 suspecting
  it has to do something with Jetty because I have Solr 3.6 running on 
  my laptop in a WebSphere profile and it does not run into this issue.
 
  Any ideas what could be causing this?  Is this question for the 
  Jetty mailing list?

 I'm guessing the Windows firewall is the problem here.  I'm betting 
 your computer is detecting your home network and the office network as 
 two different types (one as domain, the other as private, possibly), 
 and that the Windows firewall only allows connections to Jetty when 
 you are on one of those types of networks.  The websphere install may 
 have add explicit firewall exceptions for all network types when it was 
 installed.

 Fiddling with the firewall exceptions is probably the way to fix this.

 Thanks,
 Shawn




RE: Solrcloud Index corruption

2015-03-05 Thread Garth Grimm
For updates, the document will always get routed to the leader of the 
appropriate shard, no matter what server first receives the request.

-Original Message-
From: Martin de Vries [mailto:mar...@downnotifier.com] 
Sent: Thursday, March 05, 2015 4:14 PM
To: solr-user@lucene.apache.org
Subject: Re: Solrcloud Index corruption

Hi Erick,

Thank you for your detailed reply.

You say in our case some docs didn't made it to the node, but that's not really 
true: the docs can be found on the corrupted nodes when I search on ID. The 
docs are also complete. The problem is that the docs do not appear when I 
filter on certain fields (however the fields are in the doc and have the right 
value when I search on ID). So something seems to be corrupt in the filter 
index. We will try the checkindex, hopefully it is able to identify the 
problematic cores.

I understand there is not a master in SolrCloud. In our case we use haproxy 
as a load balancer for every request. So when indexing every document will be 
sent to a different solr server, immediately after each other. Maybe SolrCloud 
is not able to handle that correctly?


Thanks,

Martin




Erick Erickson schreef op 05.03.2015 19:00:

 Wait up. There's no master index in SolrCloud. Raw documents are 
 forwarded to each replica, indexed and put in the local tlog. If a 
 replica falls too far out of synch (say you take it offline), then the 
 entire index _can_ be replicated from the leader and, if the leader's 
 index was incomplete then that might propagate the error.

 The practical consequence of this is that if _any_ replica has a 
 complete index, you can recover. Before going there though, the 
 brute-force approach is to just re-index everything from scratch.
 That's likely easier, especially on indexes this size.

 Here's what I'd do.

 Assuming you have the Collections API calls for ADDREPLICA and 
 DELETEREPLICA, then:
 0 Identify the complete replicas. If you're lucky you have at least
 one for each shard.
 1 Copy 1 good index from each shard somewhere just to have a backup.
 2 DELETEREPLICA on all the incomplete replicas
 2.5 I might shut down all the nodes at this point and check that all 
 the cores I'd deleted were gone. If any remnants exist, 'rm -rf 
 deleted_core_dir'.
 3 ADDREPLICA to get the ones removed in back.

 should copy the entire index from the leader for each replica. As you 
 do the leadership will change and after you've deleted all the 
 incomplete replicas, one of the complete ones will be the leader and 
 you should be OK.

 If you don't want to/can't use the Collections API, then
 0 Identify the complete replicas. If you're lucky you have at least
 one for each shard.
 1 Shut 'em all down.
 2 Copy the good index somewhere just to have a backup.
 3 'rm -rf data' for all the incomplete cores.
 4 Bring up the good cores.
 5 Bring up the cores that you deleted the data dirs from.

 What should do is replicate the entire index from the leader. When you 
 restart the good cores (step 4 above), they'll _become_ the leader.

 bq: Is it possible to make Solrcloud invulnerable for network problems 
 I'm a little surprised that this is happening. It sounds like the 
 network problems were such that some nodes weren't out of touch long 
 enough for Zookeeper to sense that they were down and put them into 
 recovery. Not sure there's any way to secure against that.

 bq: Is it possible to see if a core is corrupt?
 There's CheckIndex, here's at least one link:
 http://java.dzone.com/news/lucene-and-solrs-checkindex
 What you're describing, though, is that docs just didn't make it to 
 the node, _not_ that the index has unexpected bits, bad disk sectors 
 and the like so CheckIndex can't detect that. How would it know what 
 _should_ have been in the index?

 bq: I noticed a difference in the Gen column on Overview - 
 Replication. Does this mean there is something wrong?
 You cannot infer anything from this. In particular, the merging will 
 be significantly different between a single full-reindex and what the 
 state of segment merges is in an incrementally built index.

 The admin UI screen is rooted in the pre-cloud days, the Master/Slave 
 thing is entirely misleading. In SolrCloud, since all the raw data is 
 forwarded to all replicas, and any auto commits that happen may very 
 well be slightly out of sync, the index size, number of segments, 
 generations, and all that are pretty safely ignored.

 Best,
 Erick

 On Thu, Mar 5, 2015 at 6:50 AM, Martin de Vries 
 mar...@downnotifier.com
 wrote:

 Hi Andrew, Even our master index is corrupt, so I'm afraid this won't 
 help in our case. Martin Andrew Butkus schreef op 05.03.2015 16:45:

 Force a fetchindex on slave from master command:
 http://slave_host:port/solr/replication?command=fetchindex - from 
 http://wiki.apache.org/solr/SolrReplication [1] The above command 
 will download the whole index from master to slave, there are 
 configuration options in solr to make this 

RE: Does shard splitting double host count

2015-02-27 Thread Garth Grimm
Well, if you're going to reindex on a newer version, just start out with the
number of shards you feel is appropriate, and reindex.

But yes, if you had 3 shards, wanted to split some of them, you'd really
have to split all of them (making 6), if you wanted the shards to be about
the same size.

As to hosts needed, if large enough, you could run 6 shards with 2 replicas
(12 cores total) on just 2 hosts.  Or up to 12 hosts.  Or something in
between.  Just depends on how many cores you can fit on a host.

-Original Message-
From: tuxedomoon [mailto:dancolem...@yahoo.com] 
Sent: Friday, February 27, 2015 8:16 AM
To: solr-user@lucene.apache.org
Subject: Does shard splitting double host count

I currently have a SolrCloud with 3 shards + replicas, it is holding 130M
documents and the r3.large hosts are running out of memory. As it's on 4.2
there is no shard splitting, I will have to reindex to a 4.3+ version.

If I had that feature would I need to split each shard into 2 subshards
resulting in a total of 6 subshards, in order to keep all shards relatively
equal?

And since host memory is the problem I'd be migrating subshards to new
hosts. So it seems I'd be going from 6 hosts to 12.  Are these assumptions
correct or is there a way to avoid doubling my host count?




--
View this message in context:
http://lucene.472066.n3.nabble.com/Does-shard-splitting-double-host-count-tp
4189595.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Does shard splitting double host count

2015-02-27 Thread Garth Grimm
You can't just add a new core to an existing collection.  You can add the new 
node to the cloud, but it won't be part of any collection.  You're not going to 
be able to just slide it in as a 4th shard to an established collection of 3 
shards.

The root of that comes from routing (I'll assume you use default routing, 
rather than any custom routing).  When you index a document into the cloud, it 
gets a unique id number attached to it.  If you have 3 shards, than each shard 
gets 1/3 of the range of those possible ids.  Inserts and/or updates for the 
same document will have the same id and be routed to the same shard.

Shard splitting just divides the range of the shard in half, and copies 
documents to the 2 new shards based upon where their id's now fall in the new 
range.  That's a little easier to manage than the more complex process of 
adding one shard, then having to adjust the ranges on all the other shards, and 
then copy entries that have to move -- all the while ensuring that new 
adds/updates/deletes are being routed to the correct location based upon 
whether the original has been copied over to the new ranges or not, yada, yada, 
yada.  I believe there's been some discussions about how to add a capability 
like that to solr (i.e. adjust shard ranges and have documents moved and 
handled correctly), but I don't think it's even in 5.0.

Now, if you feel the need to go down this path of adding a single shard to a 3 
shard collection, here's something similar.  Add your new solr node to the 
cloud.  Then create a 1 shard, 2 replica collection called collectionPart2.   
Also add a query alias for TotalCollection that points to collectionPart1, 
collectionPart2.   That way a query will get processed by all 4 of your 
shards.  Now this will make indexing more difficult, because you'll have to 
send your new documents to collectionPart2 until that collection's shard gets 
about as big as the shards on your 3 shard collection.  But some source data 
can be split up like that fairly easily, especially sequential data source.

For example, if indexing twitter or email feeds, you can create new collection 
with appropriate shard/replica configuration and feed in a day (or month, or 
whatever) of data.  Then repeat with a new collection for the next set.  Keep 
the query alias updated to span the collections you're interested in.

-Original Message-
From: tuxedomoon [mailto:dancolem...@yahoo.com] 
Sent: Friday, February 27, 2015 12:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Does shard splitting double host count

What about adding one new leader/replica pair?  It seems that would entail

a) creating the r3.large instances and volumes
b) adding 2 new Zookeeper hosts?
c) updating my Zookeeper configs (new hosts, new ids, new SOLR config)
d) restarting all ZKs
e) restarting SOLR hosts in sequence needed for correct shard/replica assignment
f)  start indexing again

So shards 1,2,3 start with 33% of the docs each.  As I start indexing new 
documents get sharded at 25% per shard.  If I reindex a document that exists 
already in shard2, does it remain in shard2 or could it migrate to another 
shard, thus removing it from shard2.

I'm looking for a migration strategy to achieve 25% docs per shard.  I would 
also consider deleting docs by daterange from shards1,2,3 and reindexing them 
to redistribute evenly.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-shard-splitting-double-host-count-tp4189595p4189672.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: fl rename of unique key in solrcloud

2014-11-15 Thread Garth Grimm
I see the same issue on 4.10.1.

I’ll open a JIRA if I don’t see one.

I guess the best immediate work around is to copy the unique field, and use 
that field for renaming?
 On Nov 15, 2014, at 3:18 AM, Suchi Amalapurapu su...@bloomreach.com wrote:
 
 Solr version:4.6.1
 
 On Sat, Nov 15, 2014 at 12:24 PM, Jeon Woosung jeonwoos...@gmail.com
 wrote:
 
 Could you let me know version of the solr?
 
 On Sat, Nov 15, 2014 at 5:05 AM, Suchi Amalapurapu su...@bloomreach.com
 wrote:
 
 Hi
 Getting the following exception when using fl renaming with unique key in
 the schema.
 http://host_name/solr/collection_name/select?q=dressfl=a1:p1
 
 where p1 is the unique key for collection_name
 For collections with single shard, this works flawlessly but results in
 the
 following exception in case of multiple shards.
 
 How do we fix this? Stack trace below.
 Suchi
 
 error: {trace: java.lang.NullPointerException\n\tat
 
 
 org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:998)\n\tat
 
 
 org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:653)\n\tat
 
 
 org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:628)\n\tat
 
 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311)\n\tat
 
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat
 org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)\n\tat
 
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721)\n\tat
 
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)\n\tat
 
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)\n\tat
 
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\n\tat
 
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)\n\tat
 
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\n\tat
 
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)\n\tat
 
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\n\tat
 
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)\n\tat
 
 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)\n\tat
 
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\n\tat
 
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)\n\tat
 
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\n\tat
 
 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\n\tat
 
 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\n\tat
 
 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\n\tat
 org.eclipse.jetty.server.Server.handle(Server.java:368)\n\tat
 
 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)\n\tat
 
 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\n\tat
 
 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)\n\tat
 
 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)\n\tat
 org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)\n\tat
 
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)\n\tat
 
 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)\n\tat
 
 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)\n\tat
 
 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)\n\tat
 
 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)\n\tat
 java.lang.Thread.run(Thread.java:662)\n,code: 500
 
 
 
 
 --
 *God bless U*
 



Re: fl rename of unique key in solrcloud

2014-11-15 Thread Garth Grimm
https://issues.apache.org/jira/browse/SOLR-6744 created.

And hopefully correctly, since that’s my first.
On Nov 15, 2014, at 9:12 AM, Garth Grimm 
garthgr...@averyranchconsulting.commailto:garthgr...@averyranchconsulting.com
 wrote:

I see the same issue on 4.10.1.

I’ll open a JIRA if I don’t see one.

I guess the best immediate work around is to copy the unique field, and use 
that field for renaming?
On Nov 15, 2014, at 3:18 AM, Suchi Amalapurapu 
su...@bloomreach.commailto:su...@bloomreach.com wrote:

Solr version:4.6.1

On Sat, Nov 15, 2014 at 12:24 PM, Jeon Woosung 
jeonwoos...@gmail.commailto:jeonwoos...@gmail.com
wrote:

Could you let me know version of the solr?

On Sat, Nov 15, 2014 at 5:05 AM, Suchi Amalapurapu 
su...@bloomreach.commailto:su...@bloomreach.com
wrote:

Hi
Getting the following exception when using fl renaming with unique key in
the schema.
http://host_name/solr/collection_name/select?q=dressfl=a1:p1

where p1 is the unique key for collection_name
For collections with single shard, this works flawlessly but results in
the
following exception in case of multiple shards.

How do we fix this? Stack trace below.
Suchi

error: {trace: java.lang.NullPointerException\n\tat


org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:998)\n\tat


org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:653)\n\tat


org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:628)\n\tat


org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:311)\n\tat


org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)\n\tat


org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721)\n\tat


org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)\n\tat


org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)\n\tat


org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\n\tat


org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)\n\tat


org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\n\tat


org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)\n\tat


org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\n\tat


org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)\n\tat


org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)\n\tat


org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\n\tat


org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)\n\tat


org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\n\tat


org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\n\tat


org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\n\tat


org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:368)\n\tat


org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)\n\tat


org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\n\tat


org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)\n\tat


org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)\n\tat
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)\n\tat

org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)\n\tat


org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)\n\tat


org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)\n\tat


org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)\n\tat


org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)\n\tat
java.lang.Thread.run(Thread.java:662)\n,code: 500




--
*God bless U*





Re: Different ids for the same document in different replicas.

2014-11-13 Thread Garth Grimm
OK.  So it sounds like doctorURL is a good key, but you don’t like the special 
characters.  I’ve used MD5 hashes of URLs before as a way to convert unique 
URLs into unique alphanumeric strings in a repeatable way.  I think most 
programming languages contain libraries for doing that as you feed the data to 
Solr (Java certainly does).  Other hashing or encoding mechanisms could be used 
if you wanted to be able to programmatically convert from the doctorURL to the 
string you want to use and back again.

Anyway, the point there being that you have a repeatable unique key that is 
derived directly from the data you’re storing.  Not a random ID value that will 
be different every time you feed the same thing in.

BTW, you can certainly use a custom field type to do the hashing work, but I’d 
suggest you do that before feeding the data to SolrCloud.  If you do it outside 
of SolrCloud, then SolrCloud can use it for routing to the correct shard.  If 
you try to do it solely in a field type, the field type output won’t be 
available until the indexing is actually occurring, which is too late for 
routing purposes.  And that means you can’t ensure that subsequent re-feeds of 
the same thing will overwrite the old values since you can’t make sure they get 
routed to the same shard.

 On Nov 12, 2014, at 7:50 PM, Meraj A. Khan mera...@gmail.com wrote:
 
 Sorry,its actually doctorUrl, so I dont want to use doctorUrl as a lookup
 mechanism because urls can have special characters that can caise issue
 with Solr lookup.
 
 I guess I should rephrase my question to ,how to auto generate the unique
 keys in the id field when using SolrCloud?
 On Nov 12, 2014 7:28 PM, Garth Grimm garthgr...@averyranchconsulting.com
 wrote:
 
 You mention you already have a unique Key identified for the data you’re
 storing in Solr:
 
 uniqueKeydoctorIduniquekey
 
 If that’s the field you’re using to uniquely identify each thing you’re
 storing in the solr index, why do you want to have an id field that is
 populated with some random value?  You’ll be using the doctorId field as
 the key, and the id field will have no real meaning in your Data Model.
 
 If doctorId actually isn’t unique to each item you plan on storing in
 Solr, is there any other field that is?  If so, use that field as your
 unique key.
 
 Remember, this uniqueKeys are usually used for routing documents to shards
 in SolrCloud, and are used to ensure that later updates of the same “thing”
 overwrite the old one, rather than generating multiple copies.  So the keys
 really should be something derived from the data your storing.  I’m not
 sure if I understand why you would want to have the key randomly generated.
 
 On Nov 12, 2014, at 6:39 PM, S.L simpleliving...@gmail.com wrote:
 
 Just tried  adding  uniqueKeyid/uniqueKey while keeping id type=
 string only blank ids are being generated ,looks like the id is being
 auto generated only if the the id is set to  type uuid , but in case of
 SolrCloud this id will be unique per replica.
 
 Is there a  way to generate a unique id both in case of SolrCloud with
 out
 using the uuid type or not having a per replica unique id?
 
 The uuid in question is of type .
 
 fieldType name=uuid class=solr.UUIDField indexed=true /
 
 
 On Wed, Nov 12, 2014 at 6:20 PM, S.L simpleliving...@gmail.com wrote:
 
 Thanks.
 
 So the issue here is I already have a uniqueKeydoctorIduniquekey
 defined in my schema.xml.
 
 If along with that I also want the id/id field to be automatically
 generated for each document do I have to declare it as a uniquekey as
 well , because I just tried the following setting without the uniqueKey
 for
 id and its only generating blank ids for me.
 
 *schema.xml*
 
   field name=id type=string indexed=true stored=true
   required=true multiValued=false /
 
 *solrconfig.xml*
 
 updateRequestProcessorChain name=uuid
 
   processor class=solr.UUIDUpdateProcessorFactory
   str name=fieldNameid/str
   /processor
   processor class=solr.RunUpdateProcessorFactory /
   /updateRequestProcessorChain
 
 
 On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm 
 garthgr...@averyranchconsulting.com wrote:
 
 Looking a little deeper, I did find this about UUIDField
 
 
 
 http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html
 
 NOTE: Configuring a UUIDField instance with a default value of NEW
 is
 not advisable for most users when using SolrCloud (and not possible if
 the
 UUID value is configured as the unique key field) since the result
 will be
 that each replica of each document will get a unique UUID value. Using
 UUIDUpdateProcessorFactory
 
 http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html
 
 to generate UUID values when documents are added is recomended
 instead.”
 
 That might describe the behavior you saw.  And the use of
 UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered
 well

Re: Can we query on _version_field ?

2014-11-13 Thread Garth Grimm
So it sounds like you’re OK with using the docURL as the unique key for routing 
in SolrCloud, but you don’t want to use it as a lookup mechanism.

If you don’t want to do a hash of it and use that unique value in a second 
unique field and feed time,
and you can’t seem to find any other field that might be unique,
and you don’t want to make your own UpdateRequestProcessorChain that would 
generate a unique field from your unique key (such as by doing an MD5 hash),
you might look at the UpdateRequestProcessorChain named “deduce” in the OOB 
solrconfig.xml.  It’s primarily designed to help dedupe results, but it’s 
technique is to concatenate multiple fields together to create a signature that 
will be unique in some way.  So instead of having to find one field in your 
data that’s unique, you could look for a couple of fields that, if combined, 
would create a unique field, and configure the “dedupe” Processor to handle 
that.


 On Nov 13, 2014, at 12:02 PM, S.L simpleliving...@gmail.com wrote:
 
 I am not sure if this a case of XY problem.
 
 I have no control over the URLs to deduce an id from them , those are from
 www, I made the URL the uniqueKey , that way the document gets replaced
 when a new document with that URL comes in .
 
 To do the detail look up I can either use the same docURL as it is , or
 try and generate a unique id filed for each document.
 
 For the later option UUID is not behaving as expected in SolrCloud and
 _version_ field seems to be serving the need .
 
 On Thu, Nov 13, 2014 at 11:35 AM, Shawn Heisey apa...@elyograg.org wrote:
 
 On 11/12/2014 10:45 PM, S.L wrote:
 We know that _version_field is a mandatory field in solrcloud schema.xml,
 it is expected to be of type long , it also seems to have unique value
 in a
 collection.
 
 However the query of the form
 
 http://server1.mydomain.com:7344/solr/collection1/select/?q=*:*fq=%28_version_:148463254894438%29wt=json
 does not seems to return any record , can we query on the _version_field
 in
 the schema.xml ?
 
 I've been watching your journey unfold on the mailing list.  The whole
 thing seems like an XY problem.
 
 If I'm reading everything correctly, you want to have a unique ID value
 that can serve as the uniqueKey, as well as a way to quickly look up a
 single document in Solr.
 
 Is there one part of the URL that serves as a unique identifier that
 doesn't contain special characters?  It seems insane that you would not
 have a unique ID value for every entity in your system that is composed
 of only regular characters.
 
 Assuming that such an ID exists (and is likely used as one piece of that
 doctorURL that you mentioned) ... if you can extract that ID value into
 its own field (either in your indexing code or a custom update
 processor), you could use that for both uniqueKey and single-document
 lookups.  Having that kind of information in your index seems like a
 generally good idea.
 
 Thanks,
 Shawn
 
 



Re: Different ids for the same document in different replicas.

2014-11-12 Thread Garth Grimm
You mention you already have a unique Key identified for the data you’re 
storing in Solr:

 uniqueKeydoctorIduniquekey

If that’s the field you’re using to uniquely identify each thing you’re storing 
in the solr index, why do you want to have an id field that is populated with 
some random value?  You’ll be using the doctorId field as the key, and the id 
field will have no real meaning in your Data Model.

If doctorId actually isn’t unique to each item you plan on storing in Solr, is 
there any other field that is?  If so, use that field as your unique key.

Remember, this uniqueKeys are usually used for routing documents to shards in 
SolrCloud, and are used to ensure that later updates of the same “thing” 
overwrite the old one, rather than generating multiple copies.  So the keys 
really should be something derived from the data your storing.  I’m not sure if 
I understand why you would want to have the key randomly generated.

 On Nov 12, 2014, at 6:39 PM, S.L simpleliving...@gmail.com wrote:
 
 Just tried  adding  uniqueKeyid/uniqueKey while keeping id type=
 string only blank ids are being generated ,looks like the id is being
 auto generated only if the the id is set to  type uuid , but in case of
 SolrCloud this id will be unique per replica.
 
 Is there a  way to generate a unique id both in case of SolrCloud with out
 using the uuid type or not having a per replica unique id?
 
 The uuid in question is of type .
 
 fieldType name=uuid class=solr.UUIDField indexed=true /
 
 
 On Wed, Nov 12, 2014 at 6:20 PM, S.L simpleliving...@gmail.com wrote:
 
 Thanks.
 
 So the issue here is I already have a uniqueKeydoctorIduniquekey
 defined in my schema.xml.
 
 If along with that I also want the id/id field to be automatically
 generated for each document do I have to declare it as a uniquekey as
 well , because I just tried the following setting without the uniqueKey for
 id and its only generating blank ids for me.
 
 *schema.xml*
 
field name=id type=string indexed=true stored=true
required=true multiValued=false /
 
 *solrconfig.xml*
 
  updateRequestProcessorChain name=uuid
 
processor class=solr.UUIDUpdateProcessorFactory
str name=fieldNameid/str
/processor
processor class=solr.RunUpdateProcessorFactory /
/updateRequestProcessorChain
 
 
 On Tue, Nov 11, 2014 at 7:47 PM, Garth Grimm 
 garthgr...@averyranchconsulting.com wrote:
 
 Looking a little deeper, I did find this about UUIDField
 
 
 http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html
 
 NOTE: Configuring a UUIDField instance with a default value of NEW is
 not advisable for most users when using SolrCloud (and not possible if the
 UUID value is configured as the unique key field) since the result will be
 that each replica of each document will get a unique UUID value. Using
 UUIDUpdateProcessorFactory
 http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html
 to generate UUID values when documents are added is recomended instead.”
 
 That might describe the behavior you saw.  And the use of
 UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered well
 here:
 
 
 http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/
 
 Though I’ve not actually tried that process before.
 
 On Nov 11, 2014, at 7:39 PM, Garth Grimm 
 garthgr...@averyranchconsulting.commailto:
 garthgr...@averyranchconsulting.com wrote:
 
 “uuid” isn’t an out of the box field type that I’m familiar with.
 
 Generally, I’d stick with the out of the box advice of the schema.xml
 file, which includes things like….
 
  !-- Only remove the id field if you have a very good reason to.
 While not strictly
required, it is highly recommended. A uniqueKey is present in
 almost all Solr
installations. See the uniqueKey declaration below where
 uniqueKey is set to id.
  --
  field name=id type=string indexed=true stored=true
 required=true multiValued=false /
 
 and…
 
 !-- Field to use to determine and enforce document uniqueness.
 Unless this field is marked with required=false, it will be a
 required field
  --
 uniqueKeyid/uniqueKey
 
 If you’re creating some key/value pair with uuid as the key as you feed
 documents in, and you know that the uuid values you’re creating are unique,
 just change the field name and unique key name from ‘id’ to ‘uuid’.  Or
 change the key name you send in from ‘uuid’ to ‘id’.
 
 On Nov 11, 2014, at 7:18 PM, S.L simpleliving...@gmail.commailto:
 simpleliving...@gmail.com wrote:
 
 Hi All,
 
 I am seeing interesting behavior on the replicas , I have a single
 shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
 number of documents ~375 that are replicated across the six replicas .
 
 The interesting thing is that the same  document has a different id in
 each one of those replicas .
 
 This is causing the fq(id:xyz) type queries to fail

Re: Different ids for the same document in different replicas.

2014-11-11 Thread Garth Grimm
“uuid” isn’t an out of the box field type that I’m familiar with.

Generally, I’d stick with the out of the box advice of the schema.xml file, 
which includes things like….

   !-- Only remove the id field if you have a very good reason to. While not 
strictly
 required, it is highly recommended. A uniqueKey is present in almost all 
Solr 
 installations. See the uniqueKey declaration below where uniqueKey is 
set to id.
   --   
   field name=id type=string indexed=true stored=true required=true 
multiValued=false / 

and…

 !-- Field to use to determine and enforce document uniqueness. 
  Unless this field is marked with required=false, it will be a required 
field
   --
 uniqueKeyid/uniqueKey

If you’re creating some key/value pair with uuid as the key as you feed 
documents in, and you know that the uuid values you’re creating are unique, 
just change the field name and unique key name from ‘id’ to ‘uuid’.  Or change 
the key name you send in from ‘uuid’ to ‘id’.

On Nov 11, 2014, at 7:18 PM, S.L simpleliving...@gmail.com wrote:

 Hi All,
 
 I am seeing interesting behavior on the replicas , I have a single
 shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
 number of documents ~375 that are replicated across the six replicas .
 
 The interesting thing is that the same  document has a different id in
 each one of those replicas .
 
 This is causing the fq(id:xyz) type queries to fail, depending on
 which replica the query goes to.
 
 I have  specified the id field in the following manner in schema.xml,
 is it the right way to specifiy an auto generated id in  SolrCloud ?
 
field name=id type=uuid indexed=true stored=true
required=true multiValued=false /
 
 
 Thanks.



Re: Different ids for the same document in different replicas.

2014-11-11 Thread Garth Grimm
Looking a little deeper, I did find this about UUIDField

http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html

NOTE: Configuring a UUIDField instance with a default value of NEW is not 
advisable for most users when using SolrCloud (and not possible if the UUID 
value is configured as the unique key field) since the result will be that each 
replica of each document will get a unique UUID value. Using 
UUIDUpdateProcessorFactoryhttp://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html
 to generate UUID values when documents are added is recomended instead.”

That might describe the behavior you saw.  And the use of 
UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered well here:

http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/

Though I’ve not actually tried that process before.

On Nov 11, 2014, at 7:39 PM, Garth Grimm 
garthgr...@averyranchconsulting.commailto:garthgr...@averyranchconsulting.com
 wrote:

“uuid” isn’t an out of the box field type that I’m familiar with.

Generally, I’d stick with the out of the box advice of the schema.xml file, 
which includes things like….

  !-- Only remove the id field if you have a very good reason to. While not 
strictly
required, it is highly recommended. A uniqueKey is present in almost all 
Solr
installations. See the uniqueKey declaration below where uniqueKey is 
set to id.
  --
  field name=id type=string indexed=true stored=true required=true 
multiValued=false /

and…

!-- Field to use to determine and enforce document uniqueness.
 Unless this field is marked with required=false, it will be a required 
field
  --
uniqueKeyid/uniqueKey

If you’re creating some key/value pair with uuid as the key as you feed 
documents in, and you know that the uuid values you’re creating are unique, 
just change the field name and unique key name from ‘id’ to ‘uuid’.  Or change 
the key name you send in from ‘uuid’ to ‘id’.

On Nov 11, 2014, at 7:18 PM, S.L 
simpleliving...@gmail.commailto:simpleliving...@gmail.com wrote:

Hi All,

I am seeing interesting behavior on the replicas , I have a single
shard and 6 replicas and on SolrCloud 4.10.1 . I  only have a small
number of documents ~375 that are replicated across the six replicas .

The interesting thing is that the same  document has a different id in
each one of those replicas .

This is causing the fq(id:xyz) type queries to fail, depending on
which replica the query goes to.

I have  specified the id field in the following manner in schema.xml,
is it the right way to specifiy an auto generated id in  SolrCloud ?

  field name=id type=uuid indexed=true stored=true
  required=true multiValued=false /


Thanks.




Re: eDismax - boost function of multiple values

2014-10-17 Thread Garth Grimm
-8149-08e64c107537,
house_no_from:18,
longitude:8.435313,
_version_:1481861578588422158},
  {
zip:76131,
inhabitants:296033,
city:Karlsruhe,
importance:1,
latitude:49.0079486,
latlong:49.0079486,8.4139096,
city_appendix:, Baden,
street:Am Künstlerhaus,
house_no_to:,
suburb:Innenstadt-Ost,
id:7f000101-4908-1bdd-8149-08e64c107538,
house_no_from:,
longitude:8.4139096,
_version_:1481861578589470720},
  {
zip:76131,
inhabitants:296033,
city:Karlsruhe,
importance:1,
latitude:49.0184689,
latlong:49.0184689,8.4070077,
city_appendix:, Baden,
street:An der Fasanengartenmauer,
house_no_to:,
suburb:Innenstadt-Ost,
id:7f000101-4908-1bdd-8149-08e64c107539,
house_no_from:,
longitude:8.4070077,
_version_:1481861578589470721}]
  }}

I can't see any differents between only inhabitants and inhabitants and 
importance.
I expected that the first result is the city Karlsruhe with 296k inhabitans and 
the importance value of 10.


Garth Grimm 
garthgr...@averyranchconsulting.commailto:garthgr...@averyranchconsulting.com
 schrieb am 16:40 Donnerstag, 16.Oktober 2014:


Spaces should work just fine.  Can you show us exactly what is happening with 
the score that leads you to the conclusion that it isn’t working?

Some testing from an example collection I have…

No boost:
http://localhost:8983/solr/collection1/select?q=text%3Abookfl=id%2Cprice%2Cyearpub%2Cscorewt=csvdefType=edismax
id,price,yearpub,score
db9780819562005,13.21,1989,0.40321594
db1562399055,17.87,2001,0.28511673
db0072519096,66.67,2008,0.28511673
db0140236392,10.88,1994,0.28511673
db04,44.99,2007,0.25200996
db07,19.77,2005,0.25200996
db0763777595,24.44,2002,0.25200996
db0879305835,43.58,2011,0.24947715
db1933550309,18.99,2004,0.24691834
db02,40.09,2009,0.21383755
Boost of just yearpub:

http://localhost:8983/solr/collection1/select?q=text%3Abookfl=id%2Cprice%2Cyearpub%2Cscorewt=csvdefType=edismaxbf=ord%28yearpub%29
id,price,yearpub,score
db0879305835,43.58,2011,11.069619
db1847195881,33.62,2010,10.635455
db02,40.09,2009,10.233932
db0072519096,66.67,2008,9.897689
db0316033723,23.1,2008,9.821208
db04,44.99,2007,9.465844
db05,44.99,2007,9.419684
db9780061336461,12.18,2007,9.398244
db07,19.77,2005,8.662797
db1933550309,18.99,2004,8.256611
boost of yearpub and price, using just a space as separator:
http://localhost:8983/solr/collection1/select?q=text%3Abookfl=id%2Cprice%2Cyearpub%2Cscorewt=csvdefType=edismaxbf=ord%28yearpub%29%20ord%28price%29
id,price,yearpub,score
db0072519096,66.67,2008,28.933228
db0879305835,43.58,2011,28.15772
db04,44.99,2007,27.414654
db05,44.99,2007,27.371819
db02,40.09,2009,27.009602
db1847195881,33.62,2010,26.636993
db9780201896831,57.43,1997,24.749598
db0767914384,37.87,1997,22.835175
db0316033723,23.1,2008,21.037462
db0763777595,24.44,2002,19.58986
Score keeps increasing with each boost.

Regards,
Garth


 Hey Ahmet,

 thanks for your answer.
 I've read about this on the following page:
 http://wiki.apache.org/solr/FunctionQuery
 Using FunctionQuery point 3:
 The bf parameter actually takes a list of function queries separated by 
 whitespace and each with an optional boost.

 If I write it the way you suggested, the result is the same.
 Only inhabitants ranked up and importance will be ignored.

 greetings




 Ahmet Arslan iori...@yahoo.commailto:iori...@yahoo.com schrieb am 20:26 
 Dienstag, 14.Oktober 2014:



 Hi Jens,

 Where did you read that you can write it separated by white spaces?

 bq and bf are both can be defined multiple times.

 q=foobf=ord(inhabitants)bf=ord(importance)

 Ahmet




 On Tuesday, October 14, 2014 6:34 PM, Jens Mayer 
 mjen...@yahoo.com.INVALIDmailto:mjen...@yahoo.com.INVALID wrote:
 Hey everyone,

 I have a question about the boost function of solr.
 The documentation say about multiple function querys that I can write it 
 seperated by whitespaces.

 Example: q=foobf=ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3

 Now I have two fields I like to boost. Inhabitants and importance.
 The field Inhabitants contain the inhabitants of citys. and the field 
 importance contain a priority value - citys have the value 10, suburb the 
 value 5 and streets the value 1.
 If I use the bf parameter I can boost inhabitants so that citys with the most 
 inhabitants ranked up.

 Example: q=foobf=ord(inhabitants)

 The same happens if I boost importance.

 Example: q=foobf=ord(importance)

 But if I try to combine both so that importance and inhabitants ranked up
 only inhabitants will be ranked up and importance will be ignored.

 Example: q=foobf=ord(inhabitants) ord(importance)

 Knows anyone how I can fix this problem?


 greetings





Re: Is there a way to prevent some keywords from being added to autosuggest dictionary?

2014-10-17 Thread Garth Grimm
What field(s) auto suggest uses is configurable.  So you could create special 
fields (and associated ‘copyField’ configs) to populate specific fields for 
auto suggest.

For example, you could have 2 fields for “hidden_desc” and “visible_desc”.  
Copy field both of them to a field named “description”.  Then set auto suggest 
to use only the “visible_desc” field to drive auto suggests.

That might be one viable option.

Regard,
Garth

On Oct 17, 2014, at 1:02 PM, bbarani bbar...@gmail.com wrote:

 We index around 10k documents in SOLR and use inbuilt suggest functionality
 for auto complete.
 
 We have a field that contain a flag that is used to show or hide the
 documents from search results. 
 
 I am trying to figure out a way to control the terms added to autosuggest
 index (to skip the documents from getting added to auto suggest index) based
 on the value of the flag. Is there a way to do that?
 
 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Is-there-a-way-to-prevent-some-keywords-from-being-added-to-autosuggest-dictionary-tp4164699.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: eDismax - boost function of multiple values

2014-10-16 Thread Garth Grimm
Spaces should work just fine.  Can you show us exactly what is happening with 
the score that leads you to the conclusion that it isn’t working?

Some testing from an example collection I have…

No boost:
http://localhost:8983/solr/collection1/select?q=text%3Abookfl=id%2Cprice%2Cyearpub%2Cscorewt=csvdefType=edismax
id,price,yearpub,score
db9780819562005,13.21,1989,0.40321594
db1562399055,17.87,2001,0.28511673
db0072519096,66.67,2008,0.28511673
db0140236392,10.88,1994,0.28511673
db04,44.99,2007,0.25200996
db07,19.77,2005,0.25200996
db0763777595,24.44,2002,0.25200996
db0879305835,43.58,2011,0.24947715
db1933550309,18.99,2004,0.24691834
db02,40.09,2009,0.21383755
Boost of just yearpub:

http://localhost:8983/solr/collection1/select?q=text%3Abookfl=id%2Cprice%2Cyearpub%2Cscorewt=csvdefType=edismaxbf=ord%28yearpub%29
id,price,yearpub,score
db0879305835,43.58,2011,11.069619
db1847195881,33.62,2010,10.635455
db02,40.09,2009,10.233932
db0072519096,66.67,2008,9.897689
db0316033723,23.1,2008,9.821208
db04,44.99,2007,9.465844
db05,44.99,2007,9.419684
db9780061336461,12.18,2007,9.398244
db07,19.77,2005,8.662797
db1933550309,18.99,2004,8.256611
boost of yearpub and price, using just a space as separator:
http://localhost:8983/solr/collection1/select?q=text%3Abookfl=id%2Cprice%2Cyearpub%2Cscorewt=csvdefType=edismaxbf=ord%28yearpub%29%20ord%28price%29
id,price,yearpub,score
db0072519096,66.67,2008,28.933228
db0879305835,43.58,2011,28.15772
db04,44.99,2007,27.414654
db05,44.99,2007,27.371819
db02,40.09,2009,27.009602
db1847195881,33.62,2010,26.636993
db9780201896831,57.43,1997,24.749598
db0767914384,37.87,1997,22.835175
db0316033723,23.1,2008,21.037462
db0763777595,24.44,2002,19.58986
Score keeps increasing with each boost.

Regards,
Garth

 Hey Ahmet,
 
 thanks for your answer.
 I've read about this on the following page:
 http://wiki.apache.org/solr/FunctionQuery 
 Using FunctionQuery point 3:
 The bf parameter actually takes a list of function queries separated by 
 whitespace and each with an optional boost.
 
 If I write it the way you suggested, the result is the same.
 Only inhabitants ranked up and importance will be ignored.
 
 greetings
 
 
 
 
 Ahmet Arslan iori...@yahoo.com schrieb am 20:26 Dienstag, 14.Oktober 2014:
 
 
 
 Hi Jens,
 
 Where did you read that you can write it separated by white spaces?
 
 bq and bf are both can be defined multiple times.
 
 q=foobf=ord(inhabitants)bf=ord(importance)
 
 Ahmet
 
 
 
 
 On Tuesday, October 14, 2014 6:34 PM, Jens Mayer mjen...@yahoo.com.INVALID 
 wrote:
 Hey everyone,
 
 I have a question about the boost function of solr.
 The documentation say about multiple function querys that I can write it 
 seperated by whitespaces.
 
 Example: q=foobf=ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3
 
 Now I have two fields I like to boost. Inhabitants and importance.
 The field Inhabitants contain the inhabitants of citys. and the field 
 importance contain a priority value - citys have the value 10, suburb the 
 value 5 and streets the value 1.
 If I use the bf parameter I can boost inhabitants so that citys with the most 
 inhabitants ranked up.
 
 Example: q=foobf=ord(inhabitants)
 
 The same happens if I boost importance.
 
 Example: q=foobf=ord(importance)
 
 But if I try to combine both so that importance and inhabitants ranked up 
 only inhabitants will be ranked up and importance will be ignored.
 
 Example: q=foobf=ord(inhabitants) ord(importance)
 
 Knows anyone how I can fix this problem?
 
 
 greetings



RE: [ANN] Lucidworks Fusion 1.0.0

2014-10-05 Thread Garth Grimm
Well, the current release is only supported on Linux.  A Windows compatible 
release is planned for later this year.  

-Original Message-
From: Anurag Sharma [mailto:anura...@gmail.com] 
Sent: Sunday, October 05, 2014 12:23 PM
To: solr-user@lucene.apache.org
Subject: Re: [ANN] Lucidworks Fusion 1.0.0

I downloaded fusion and tried to run it on windows 8 using cygwin. It's giving 
Error: Unable to access jarfile /home/user1/fusion/jetty/home/start.jar. Also 
tried changing the permission of jar, .sh and all folder/subfolders in fusion 
to 777 but still getting the same error.

Please share your experience if tried running fusion on windows 8 or facing the 
above issue on other port.

Thanks
Anurag

On Mon, Sep 29, 2014 at 6:05 AM, Aman Tandon amantandon...@gmail.com
wrote:

 Hi,

 How we can see the demo for NLP?
 On Sep 24, 2014 4:43 PM, Grant Ingersoll gsing...@apache.org wrote:

  Hi Thomas,
 
  Thanks for the question, yes, I give a brief demo of it in action 
  during my talk and we will have demos at our booth.  I will also 
  give a demo during the Webinar, which will be recorded.  As others 
  have said as well, you can simply download it and try yourself.
 
  Cheers,
  Grant
 
  On Sep 23, 2014, at 2:00 AM, Thomas Egense thomas.ege...@gmail.com
  wrote:
 
   Hi Grant.
   Will there be a Fusion demostration/presentation  at Lucene/Solr
  Revolution
   DC? (Not listed in the program yet).
  
  
   Thomas Egense
  
   On Mon, Sep 22, 2014 at 3:45 PM, Grant Ingersoll 
   gsing...@apache.org
   wrote:
  
   Hi All,
  
   We at Lucidworks are pleased to announce the release of 
   Lucidworks
  Fusion
   1.0.   Fusion is built to overlay on top of Solr (in fact, you can
  manage
   multiple Solr clusters -- think QA, staging and production -- all 
   from
  our
   Admin).In other words, if you already have Solr, simply point
  Fusion at
   your instance and get all kinds of goodies like Banana ( 
   https://github.com/LucidWorks/Banana -- our port of Kibana to 
   Solr +
 a
   number of extensions that Kibana doesn't have), collaborative
 filtering
   style recommendations (without the need for Hadoop or Mahout!), a
 modern
   signal capture framework, analytics, NLP integration,
 Boosting/Blocking
  and
   other relevance tools, flexible index and query time pipelines as 
   well
  as a
   myriad of connectors ranging from Twitter to web crawling to
 Sharepoint.
   The best part of all this?  It all leverages the infrastructure 
   that
 you
   know and love: Solr.  Want recommendations?  Deploy more Solr.  
   Want
 log
   analytics?  Deploy more Solr.  Want to track important system metrics?
   Deploy more Solr.
  
   Fusion represents our commitment as a company to continue to
 contribute
  a
   large quantity of enhancements to the core of Solr while 
   complementing
  and
   extending those capabilities with value adds that integrate a 
   number
 of
  3rd
   party (e.g connectors) and home grown capabilities like an all 
   new, responsive UI built in AngularJS.  Fusion is not a fork of 
   Solr.  We
 do
  not
   hide Solr in any way.  In fact, our goal is that your existing
  applications
   will work out of the box with Fusion, allowing you to take 
   advantage
 of
  new
   capabilities w/o overhauling your existing application.
  
   If you want to learn more, please feel free to join our technical
  webinar
   on October 2:
  http://lucidworks.com/blog/say-hello-to-lucidworks-fusion/.
   If you'd like to download: http://lucidworks.com/product/fusion/.
  
   Cheers,
   Grant Ingersoll
  
   
   Grant Ingersoll | CTO
   gr...@lucidworks.com | @gsingers
   http://www.lucidworks.com
  
  
 
  
  Grant Ingersoll | @gsingers
  http://www.lucidworks.com
 
 
 
 
 
 



RE: Solr Cloud Query Scaling

2014-01-09 Thread Garth Grimm
As a follow-up question on this

One would want to use some kind of load balancing 'above' the SolrCloud 
installation for search queries, correct?  To ensure that the initial requests 
would get distributed evenly to all nodes?

If you don't have that, and send all requests to M2S2 (IRT OP), it would be the 
only node that would ever act as controller, and it could become a bottleneck 
that further replicas won't be able to alleviate.  Correct?

Or is there something in the SolrCloud itself that even distributes the 
controller role, regardless of which node the query initially arrives at?

-Original Message-
From: Tim Potter [mailto:tim.pot...@lucidworks.com] 
Sent: Thursday, January 09, 2014 12:28 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr Cloud Query Scaling

Absolutely adding replicas helps you scale query load. Queries do not need to 
be routed to leaders; they can be handled by any replica in a shard. Leaders 
are only needed for handling update requests.

In general, a distributed query has two phases, driven by a controller node 
(what you called collator below). The controller is the Solr that received the 
query request from the client. In Phase 1, the controller distributes the query 
to one of the replicas for all shards and receives back the list of matching 
document IDs from each replica (only a page worth btw). 

The controller merges the results and sorts them to generate a final page of 
results to be returned to the client. In Phase 2, the controller collects all 
the fields from the documents to generate the final result set by querying the 
replicas involved in Phase 1.

The controller uses SolrJ's LBSolrServer to query the shards in Phase 1 so you 
get some basic load-balancing amongst replicas for a shard. I've not done any 
research to see how balanced that selection process is in production but I 
suspect if you have 3 replicas in a shard, then roughly 1/3 of the queries go 
to each.

Timothy Potter
Sr. Software Engineer, LucidWorks
www.lucidworks.com


From: Sir Gilligan sirgilli...@yahoo.com
Sent: Thursday, January 09, 2014 11:02 AM
To: solr-user@lucene.apache.org
Subject: Solr Cloud Query Scaling

Question: Does adding replicas help with query load?

Scenario: 3 Physical Machines. 3 Shards
Query any machine, get results. Standard Solr Cloud stuff.

Update Scenario: 6 Physical Machines. 3 Shards.
M = Machine, S = Shard, -L = Leader
M1S1-L
M2S2
M3S3
M4S1
M5S2-L
M6S3-L

Incoming Query to M2S2. How will Solr Cloud (4.6.0) distribute the query?
Will M2S2 handle the query for shard 2? Or, will it send it to the leader of S2 
which is M5S2?
When the query is distributed, will it send it to the other leaders? OR, will 
it send it to any shard?
Specifically:
Query sent to M2S2. Solr Cloud distributes the query. Could it possibly send 
the query on to M3S3 and M4S1? Some kind of query load balance functionality 
(maybe like a round robin to the shard members).
OR will M2S2 just be the collator, and send the query to the leaders?
OR something different that I have not described?

If queries do not have to be processed by leaders then we could add three more 
physical machines (now total 9 machines) and handle more query load.

Thank you.


Zookeeper down question

2013-11-19 Thread Garth Grimm
Given a 4 solr node instance (i.e. 2 shards, 2 replicas per shard), and a 
standalone zookeeper.

Correct me if any of my understanding is incorrect on the following:
If ZK goes down, most normal operations will still function, since my 
understanding is that ZK isn't involved on a transaction by transaction basis 
for each of these.
Document adds, updates, and deletes on existing collection will still work as 
expected.
Queries will still get processed as expected.
Is the above correct?

But adding new collections, changing configs, etc., will all fail while ZK is 
down (or at least, place things in an inconsistent state?)
Is that correct?

If, while ZK is down, one of the 4 solr nodes also goes down, will all normal 
operations fail?  Will they all continue to succeed?  I.e. will each of the 
nodes realize which node is down and route indexing and query requests around 
them, or is that impossible while ZK is down?  Will some queries succeed 
(because they were lucky enough to get routed to the one replica on the one 
shard that is still functional) while other queries fail (they aren't so lucky 
and get routed to the one replica that is down on the one shard)?

Thanks,
Garth Grimm




hung solr instance behavior

2013-11-19 Thread Garth Grimm
Given a 4 node Solr Cloud (i.e. 2 shards, 2 replicas per shard).

Let's say one node becomes 'nonresponsive'.  Meaning sockets get created, but 
transactions to them don't get handled (i.e. they time out).  We'll also assume 
that means the solr instance can't send information out to zookeeper or other 
solar instances.

Does ZK become aware of the issue at all?
Do normal indexing operations fail (I would assume so based on a timeout, but 
just checking)?
What would happen with query requests (let's assume the requests aren't sent 
directly to the 'hung' instance).  Do some queries succeed, but others fail 
(i.e. timeout) based upon whether the node in the shard asked to handle the 
query is the 'hung' one or not?  Is there an automatic timeout functionality 
where all queries will still succeed, but some will be much slower(i.e. if the 
'hung' one is asked to handle it, there'll be a timeout and then the other core 
on the shard will be asked to handle it)?

Thanks,
Garth


RE: Zookeeper down question

2013-11-19 Thread Garth Grimm
Thanks Mark and Tim.  My understanding has been upgraded.

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Tuesday, November 19, 2013 1:59 PM
To: solr-user@lucene.apache.org
Subject: Re: Zookeeper down question


On Nov 19, 2013, at 2:24 PM, Timothy Potter thelabd...@gmail.com wrote:

 Good questions ... From my understanding, queries will work if Zk goes 
 down but writes do not work w/o Zookeeper. This works because the 
 clusterstate is cached on each node so Zookeeper doesn't participate 
 directly in queries and indexing requests. Solr has to decide not to 
 allow writes if it loses its connection to Zookeeper, which is a safe 
 guard mechanism. In other words, Solr assumes it's pretty safe to 
 allow reads if the cluster doesn't have a healthy coordinator, but chooses to 
 not allow writes to be safe.

Right - we currently stop accepting writes when Solr cannot talk to ZooKeeper - 
this is because we can no longer count on knowing about any changes to the 
cluster and no new leaders can be elected, etc. It gets tricky fast if you 
consider allowing updates without ZooKeeper connectivity for very long.

 
 If a Solr nodes goes down while ZK is not available, since Solr no 
 longer accepts writes, leader / replica doesn't really matter. I'd 
 venture to guess there is some failover logic built in when executing 
 distributing queries but I'm not as familiar with that part of the 
 code (I'll brush up on it though as I'm now curious as well).

Right - query requests will fail over to other replicas - this is important in 
general because the cluster state a Solr instance has can be a bit stale - so a 
request might hit something that has gone down and another replica in the shard 
can be tried. We use the load balancing solrj client for these internal 
requests. CloudSolrServer handles failover for the user (or non internal) 
requests. Or you can use your own external load balancer.

- Mark

 
 Cheers,
 Tim
 
 
 On Tue, Nov 19, 2013 at 11:58 AM, Garth Grimm  
 garthgr...@averyranchconsulting.com wrote:
 
 Given a 4 solr node instance (i.e. 2 shards, 2 replicas per shard), 
 and a standalone zookeeper.
 
 Correct me if any of my understanding is incorrect on the following:
 If ZK goes down, most normal operations will still function, since my 
 understanding is that ZK isn't involved on a transaction by 
 transaction basis for each of these.
 Document adds, updates, and deletes on existing collection will still 
 work as expected.
 Queries will still get processed as expected.
 Is the above correct?
 
 But adding new collections, changing configs, etc., will all fail 
 while ZK is down (or at least, place things in an inconsistent 
 state?) Is that correct?
 
 If, while ZK is down, one of the 4 solr nodes also goes down, will 
 all normal operations fail?  Will they all continue to succeed?  I.e. 
 will each of the nodes realize which node is down and route indexing 
 and query requests around them, or is that impossible while ZK is 
 down?  Will some queries succeed (because they were lucky enough to 
 get routed to the one replica on the one shard that is still 
 functional) while other queries fail (they aren't so lucky and get 
 routed to the one replica that is down on the one shard)?
 
 Thanks,
 Garth Grimm
 
 
 



RE: Change config set for a collection

2013-10-17 Thread Garth Grimm
But if you're working with multiple configs in zookeeper, be aware that 4.5 
currently has an issue creating multiple collections in a cloud that has 
multiple configs.  It's targeted to be fixed whenever 4.5.1 comes out.

https://issues.apache.org/jira/i#browse/SOLR-5306


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Thursday, October 17, 2013 10:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Change config set for a collection

On 10/17/2013 2:36 AM, michael.boom wrote:
 The question also asked some 10 months ago in 
 http://lucene.472066.n3.nabble.com/SolrCloud-4-1-change-config-set-for
 -a-collection-td4037456.html, and then the answer was negative, but 
 here it goes again, maybe now it's different.
 
 Is it possible to change the config set of a collection using the 
 Collection API to another one (stored in zookeeper)? If not, is it 
 possible to do it using zkCli ?
 
 Also how can somebody check which config set a collection is using ?
 Thanks!

The zkcli command linkconfig should take care of that.  You'd need to reload 
the collection after making the change.  If you're using a version prior to 
4.4, reloading doesn't work, you need to restart Solr completely.

You can see what config a collection is using with the Cloud-Tree section of 
the admin UI.  Open /collections and click on the collection.
 At the bottom of the right-hand window, it has a small JSON string with 
configName in it.  I don't know of a way to easily get this information from 
Solr with a program.  If your program is Java, you could very likely grab the 
zookeeper object from CloudSolrServer and find it that way, but I have no idea 
how to write that code.

Thanks,
Shawn



RE: Switching indexes

2013-10-17 Thread Garth Grimm
Go to the admin screen for Cloud/Tree, and then click the node for 
aliases.json.  To the lower right, you should see something like:

{collection:{AdWorksQuery:AdWorks}}

Or access the Zookeeper instance, and do a 'get /aliases.json'.

-Original Message-
From: Christopher Gross [mailto:cogr...@gmail.com] 
Sent: Thursday, October 17, 2013 2:40 PM
To: solr-user
Subject: Re: Switching indexes

Also, when I make an alias:
http://index1:8080/solr/admin/collections?action=CREATEALIASname=test1-aliascollections=test1

I get a pretty useless response:
responselst name=responseHeaderint name=status0/intint 
name=QTime0/int/lst/response

So I'm not sure if it is made.  I tried going to:
http://index1:8080/solr/test1-alias/select?q=*:*
but that didn't work.  How do I use an alias when it gets made?


-- Chris


On Thu, Oct 17, 2013 at 2:51 PM, Christopher Gross cogr...@gmail.comwrote:

 OK, super confused now.


 http://index1:8080/solr/admin/cores?action=CREATEname=test2collectio
 n=test2numshards=1replicationFactor=3

 Nets me this:
 response
 lst name=responseHeader
 int name=status400/int
 int name=QTime15007/int
 /lst
 lst name=error
 str name=msgError CREATEing SolrCore 'test2': Could not find 
 configName for collection test2 found:[xxx, xxx, , x, 
 xx]/str int name=code400/int /lst /response

 For that node (test2), in my solr data directory, I have a folder with 
 the conf files and an existing data dir (copied the index from another 
 location).

 Right now it seems like the only way that I can add in a collection is 
 to load the configs into zookeeper, stop tomcat, add it to the 
 solr.xml file, and restart tomcat.

 Is there a primer that I'm missing for how to do this?

 Thanks.


 -- Chris


 On Wed, Oct 16, 2013 at 2:59 PM, Christopher Gross cogr...@gmail.comwrote:

 Thanks Shawn, the explanations help bring me forward to the SolrCloud
 mentality.

 So it sounds like going forward that I should have a more complicated 
 name (ex: coll1-20131015) aliased to coll1, to make it easier to 
 switch in the future.

 Now, if I already have an index (copied from one location to 
 another), it sounds like I should just remove my existing (bad/old 
 data) coll1, create the replicated one (calling it coll1-date), 
 then alias coll1 to that one.

 This type of information would have been awesome to know before I got 
 started, but I can make do with what I've got going now.

 Thanks again!


 -- Chris


 On Wed, Oct 16, 2013 at 2:40 PM, Shawn Heisey s...@elyograg.org wrote:

 On 10/16/2013 11:51 AM, Christopher Gross wrote:
  Ok, so I think I was confusing the terminology (still in a 3.X 
  mindset
 I
  guess.)
 
  From the Cloud-Tree, I do see that I have collections for what 
  I was calling core1, core2, etc.
 
  So, to redo the above,
  Servers: index1, index2, index3
  Collections: (on each) coll1, coll2 Collection (core?) on index1: 
  coll1new
 
  Each Collection has 1 shard (too small to make sharding worthwhile).
 
  So should I run something like this:
 
 http://index1:8080/solr/admin/collections?action=CREATEALIASname=co
 ll1collections=col11new
 
  Or will I need coll1new to be on each of the index1, index2 and 
  index3 instances of Solr?

 I don't think you can create an alias if a collection already exists 
 with that name - so having a collection named core1 means you 
 wouldn't want an alias named core1.  I could be wrong, but just to 
 keep things clean, I wouldn't recommend it, even if it's possible.

 That CREATEALIAS command will only work if coll1new shows up in 
 /collections and shows green on the cloud graph.  If it does, and 
 you're using an alias name that doesn't already exist as a 
 collection, then you're good.

 Whether coll1new is living on one server, two servers, or all three 
 servers doesn't matter for CREATEALIAS, or for most other 
 collection-related topics.  Any query or update can be sent to any 
 server in the cloud and it will be routed to the correct place 
 according to the clusterstate.

 Where things live and how many replicas there are *does* matter for 
 a discussion about redundancy.  Generally speaking, you're going to 
 want your shards to have at least two replicas, so that if a Solr 
 instance goes down, or is taken down for maintenance, your cloud 
 remains fully operational.  In your situation, you probably want 
 three replicas - so each collection lives on all three servers.

 So my general advice:

 Decide what name you want your application to use, make sure none of 
 your existing collections are using that name, and set up an alias 
 with that name pointing to whichever collection is current.  Then 
 change your application configurations or code to point at the alias 
 instead of directly at the collection.

 When you want to do your reindex, first create a new collection 
 using the collections API.  Index to that new collection.  When it's 
 ready to go, use CREATEALIAS to update the alias, and your 
 application will start 

RE: Switching indexes

2013-10-16 Thread Garth Grimm
I'd suggest using the Collections API:
http://localhost:8983/solr/admin/collections?action=CREATEALIASname=aliascollections=collection1,collection2...

See the Collections Aliases section of http://wiki.apache.org/solr/SolrCloud.

BTW, once you make the aliases, Zookeeper will have entries in /aliases.json 
that will tell you what aliases are defined and what they point to.

-Original Message-
From: Christopher Gross [mailto:cogr...@gmail.com] 
Sent: Wednesday, October 16, 2013 10:44 AM
To: solr-user
Subject: Re: Switching indexes

Garth,

I think I get what you're saying, but I want to make sure.

I have 3 servers (index1, index2, index3), with Solr living on port 8080.

Each of those has 3 cores loaded with data:
core1 (old version)
core1new (new version)
core2 (unrelated to core1)

If I wanted to make it so that queries to core1 are really going to core1new, 
I'd run:
http://index1:8080/solr/admin/cores?action=CREATEALIASname=core1collections=core1newshard=shard1

Correct?

-- Chris


On Wed, Oct 16, 2013 at 9:02 AM, Garth Grimm  
garthgr...@averyranchconsulting.com wrote:

 The alias applies to the entire cloud, not a single core.

 So you'd have your indexing application point to a collection alias
 named 'index'.  And that alias would point to core1.
 You'd have your query applications point to a collection alias named 
 'query', and that would point to core1, as well.

 Then use the Collection API to create core1new across the entire cloud.
  Then update the 'index' alias to point to core1new.  Feed documents 
 in, run warm-up scripts, run smoke tests, etc., etc.
 When you're ready, point the 'query' alias to core1new.

 You're now running completely on core1new, and can use the Collection 
 API to delete core1 from the cloud.  Or keep it around as a backup to 
 which you can restore simply by changing 'query' alias.

 -Original Message-
 From: Christopher Gross [mailto:cogr...@gmail.com]
 Sent: Wednesday, October 16, 2013 7:05 AM
 To: solr-user
 Subject: Re: Switching indexes

 Shawn,

 It all makes sense, I'm just dealing with production servers here so 
 I'm trying to be very careful (shutting down one node at a time is OK, 
 just don't want to do something catastrophic.)

 OK, so I should use that aliasing feature.

 On index1 I have:
 core1
 core1new
 core2

 On index2 and index3 I have:
 core1
 core2

 If I do the alias command on index1 and have core1 alias core1new:
 1) Will that then get rid of the existing core1 and have core1new 
 data be used for queries?
 2) Will that change make core1 instances on index2 and index3 update 
 to have core1new data?

 Thanks again!



 -- Chris


 On Tue, Oct 15, 2013 at 7:30 PM, Shawn Heisey s...@elyograg.org wrote:

  On 10/15/2013 2:17 PM, Christopher Gross wrote:
 
  I have 3 Solr nodes (and 5 ZK nodes).
 
  For #1, would I have to do that on all of them?
  For #2, I'm not getting the auto-replication between node 1 and 
  nodes
  2 
  3
  for my new index.
 
  I have 2 indexes -- just call them index and indexbk (bk being 
  the backup containing the full data set) up and running on one node.
  If I were to do a swap (via the Core Admin page), would that push 
  the changes for indexbk over to the other two nodes?  Would I need 
  to do that switch on the leader, or could that be done on one of 
  the other
 nodes?
 
 
  For #1, I don't know how you want to handle your sharding and/or 
  replication.  I would assume that you probably have numShards=1 and 
  replicationFactor=3, but I could be wrong. At any rate, where the 
  collection lives is an implementation detail that's up to you.
  SolrCloud keeps track of all your collections, whether they are on 
  one server or all servers. Typically you can send requests (queries, 
  API calls, etc) that deal with entire collections to any node in 
  your cluster and they will be handled correctly.  If you need to 
  deal with a specific core, that call needs to go to the correct node.
 
  For #2, when you create a core and want it to be a replica of 
  something that already exists, you need to give it a name that's not 
  in use on your cluster, such as index2_shard1_replica3.  You also 
  tell it what collection it's part of, which for my example, would 
  probably be index2.  Then you tell it what shard it will contain.  
  That will be
 shard1, shard2, etc.
   Here's an example of a CREATE call:
 
  http://server:port/solr/admin/**cores?action=CREATEname=**
  index2_shard1_replica3**collection=index2shard=shard1
 
  For the rest of your message: Core swapping and SolrCloud do NOT get 
  along.  If you are using SolrCloud, CoreAdmin features like that 
  need to disappear from your toolset. Attempting a core swap will 
  make bad things
  (tm) happen.
 
  Collection aliasing is the way in SolrCloud that you can now do what 
  used to be done with swapping.  You have collections named index1, 
  index2, index3, etc ... and you keep an alias called just index 
  that points to one