Re: Changing merge policy config on production

2017-12-15 Thread Erick Erickson
The merge rate will be limited by the number of merge threads. You'll merge
more often though so the load will change. That said, I wouldn't be
concerned unless you have a very high indexing rate.

Why do you want to change anyway? Unless you've tried the new settings in a
Dev environment, the biggest risk seems to me to be whether the new
settings are just plain bad in your situation rather than what the short
term effects are.

On Dec 15, 2017 4:20 PM, "alexpusch"  wrote:

> Hi,
> Is it safe to change the mergePolicyFactory config on production servers?
> Specifically maxMergeAtOnce and segmentsPerTier. How will solr reconcile
> the
> current state of the segments with the new config? In case of setting
> segmentsPerTier to a lower number - will subsequent merges be particulary
> heavy and might cause performance issues?
>
> Thanks,
> Alex.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Changing merge policy config on production

2017-12-15 Thread alexpusch
Hi,
Is it safe to change the mergePolicyFactory config on production servers?
Specifically maxMergeAtOnce and segmentsPerTier. How will solr reconcile the
current state of the segments with the new config? In case of setting
segmentsPerTier to a lower number - will subsequent merges be particulary
heavy and might cause performance issues?

Thanks,
Alex.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr ssl issue while creating collection

2017-12-15 Thread Erick Erickson
No. ZooKeeper is an integral part of SolrCloud, without it you don't
_have_ SolrCloud.

Best,
Erick

On Fri, Dec 15, 2017 at 1:03 PM, Sundaram, Dinesh
 wrote:
> Thanks again for your valuable reply. Yes that’s correct. Is there a way to 
> start solr alone without any embedded/external zookeeper in solrcloud mode?
>
>
> Dinesh Sundaram
> MBS Platform Engineering
>
> Mastercard
>
>
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Wednesday, December 13, 2017 4:54 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr ssl issue while creating collection
>
> On 12/13/2017 3:16 PM, Sundaram, Dinesh wrote:
>> Thanks Shawn for your input, Is this errors specific only for zookeeper 
>> operations? If so is there any way to turn off default zookeeper which runs 
>> on 9983?
>
> If you don't want to start the embedded zookeeper, then you want to be sure 
> that you have a zkHost defined which lists all of the hosts in your external 
> ensemble.  You can either define ZK_HOST in the include script, or use the -z 
> option when starting Solr manually.  When Solr is provided with information 
> about ZK hosts, it does NOT start the embedded ZK.
>
> The exceptions you're seeing have nothing to do with zookeeper.  The latest 
> exception you mentioned is caused by one SolrCloud instance sending HTTPS 
> requests to another SolrCloud instance, and failing to validate SSL because 
> the hostname doesn't match the info in the certificate.
>
> Thanks,
> Shawn
>
>
> CONFIDENTIALITY NOTICE This e-mail message and any attachments are only for 
> the use of the intended recipient and may contain information that is 
> privileged, confidential or exempt from disclosure under applicable law. If 
> you are not the intended recipient, any disclosure, distribution or other use 
> of this e-mail message or attachments is prohibited. If you have received 
> this e-mail message in error, please delete and notify the sender 
> immediately. Thank you.


Re: Debugging custom RequestHander: spinning up a core for debugging

2017-12-15 Thread Erick Erickson
My guess is this isn't a Solr issue at all; you are somehow using an old Java.

RBBIDataWrapper is from

com.ibm.icu.text;

I saw on a quick Google that this was cured by re-installing Eclipse,
but that was from 5 years ago.

You say your Java and IDE skills are a bit rusty, maybe you haven't
updated your Java JDK or Eclipse in a while? I don't know if Eclipse
somehow has its own Java (I haven't used Eclipse for quite a while).

I take it this runs outside Eclipse OK? (well, with problems otherwise
you wouldn't be stepping through it.)

Best,
Erick

On Fri, Dec 15, 2017 at 1:16 PM, Tod Olson  wrote:
> Hi everyone,
>
> I need to do some step-wise debugging on a custom RequestHandler. I'm trying 
> to spin up a core in a Junit test, with the idea of running it inside of 
> Eclipse for debugging. (If there's an easier way, I'd like to see a walk 
> through!) Problem is the core fails to spin up with:
>
> java.io.IOException: Break Iterator Rule Data Magic Number Incorrect, or 
> unsupported data version
>
> Here's the code, just trying to load (cribbed and adapted from 
> https://stackoverflow.com/questions/45506381/how-to-debug-solr-plugin):
>
> public class BrowseHandlerTest
> {
> private static CoreContainer container;
> private static SolrCore core;
>
> private static final Logger logger = Logger.getGlobal();
>
>
>
> @BeforeClass
> public static void prepareClass() throws Exception
> {
> String solrHomeProp = "solr.solr.home";
> System.out.println(solrHomeProp + "= " + 
> System.getProperty(solrHomeProp));
> // create the core container from the solr.solr.home system property
> container = new CoreContainer();
> container.load();
> core = container.getCore("biblio");
> logger.info("Solr core loaded!");
> }
>
> @AfterClass
> public static void cleanUpClass()
> {
> core.close();
> container.shutdown();
> logger.info("Solr core shut down!");
> }
> }
>
> The test, run through ant, fails as follows:
>
> [junit] solr.solr.home= /Users/tod/src/vufind/solr/vufind
> [junit] SLF4J: Defaulting to no-operation (NOP) logger implementation
> [junit] SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for 
> further details.
> [junit] SLF4J: Failed to load class "org.slf4j.impl.StaticMDCBinder".
> [junit] SLF4J: Defaulting to no-operation MDCAdapter implementation.
> [junit] SLF4J: See http://www.slf4j.org/codes.html#no_static_mdc_binder 
> for further details.
> [junit] Tests run: 0, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 
> 1.299 sec
> [junit]
> [junit] - Standard Error -
> [junit] SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> [junit] SLF4J: Defaulting to no-operation (NOP) logger implementation
> [junit] SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for 
> further details.
> [junit] SLF4J: Failed to load class "org.slf4j.impl.StaticMDCBinder".
> [junit] SLF4J: Defaulting to no-operation MDCAdapter implementation.
> [junit] SLF4J: See http://www.slf4j.org/codes.html#no_static_mdc_binder 
> for further details.
> [junit] -  ---
> [junit] Testcase: org.vufind.solr.handler.tests.BrowseHandlerTest: Caused 
> an ERROR
> [junit] SolrCore 'biblio' is not available due to init failure: JVM Error 
> creating core [biblio]: null
> [junit] org.apache.solr.common.SolrException: SolrCore 'biblio' is not 
> available due to init failure: JVM Error creating core [biblio]: null
> [junit]  at 
> org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1066)
> [junit]  at 
> org.vufind.solr.handler.tests.BrowseHandlerTest.prepareClass(BrowseHandlerTest.java:45)
> [junit] Caused by: org.apache.solr.common.SolrException: JVM Error 
> creating core [biblio]: null
> [junit]  at 
> org.apache.solr.core.CoreContainer.create(CoreContainer.java:833)
> [junit]  at 
> org.apache.solr.core.CoreContainer.access$000(CoreContainer.java:87)
> [junit]  at 
> org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:467)
> [junit]  at 
> org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:458)
> [junit]  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [junit]  at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231)
> [junit]  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [junit]  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [junit]  at java.lang.Thread.run(Thread.java:745)
> [junit] Caused by: java.lang.ExceptionInInitializerError
> [junit]  at 
> org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory.inform(ICUTokenizerFactory.java:107)
> 

Debugging custom RequestHander: spinning up a core for debugging

2017-12-15 Thread Tod Olson
Hi everyone,

I need to do some step-wise debugging on a custom RequestHandler. I'm trying to 
spin up a core in a Junit test, with the idea of running it inside of Eclipse 
for debugging. (If there's an easier way, I'd like to see a walk through!) 
Problem is the core fails to spin up with:

java.io.IOException: Break Iterator Rule Data Magic Number Incorrect, or 
unsupported data version

Here's the code, just trying to load (cribbed and adapted from 
https://stackoverflow.com/questions/45506381/how-to-debug-solr-plugin):

public class BrowseHandlerTest
{
private static CoreContainer container;
private static SolrCore core;

private static final Logger logger = Logger.getGlobal();



@BeforeClass
public static void prepareClass() throws Exception
{
String solrHomeProp = "solr.solr.home";
System.out.println(solrHomeProp + "= " + 
System.getProperty(solrHomeProp));
// create the core container from the solr.solr.home system property
container = new CoreContainer();
container.load();
core = container.getCore("biblio");
logger.info("Solr core loaded!");
}

@AfterClass
public static void cleanUpClass()
{
core.close();
container.shutdown();
logger.info("Solr core shut down!");
}
}

The test, run through ant, fails as follows:

[junit] solr.solr.home= /Users/tod/src/vufind/solr/vufind
[junit] SLF4J: Defaulting to no-operation (NOP) logger implementation
[junit] SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for 
further details.
[junit] SLF4J: Failed to load class "org.slf4j.impl.StaticMDCBinder".
[junit] SLF4J: Defaulting to no-operation MDCAdapter implementation.
[junit] SLF4J: See http://www.slf4j.org/codes.html#no_static_mdc_binder for 
further details.
[junit] Tests run: 0, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 
1.299 sec
[junit]
[junit] - Standard Error -
[junit] SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
[junit] SLF4J: Defaulting to no-operation (NOP) logger implementation
[junit] SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for 
further details.
[junit] SLF4J: Failed to load class "org.slf4j.impl.StaticMDCBinder".
[junit] SLF4J: Defaulting to no-operation MDCAdapter implementation.
[junit] SLF4J: See http://www.slf4j.org/codes.html#no_static_mdc_binder for 
further details.
[junit] -  ---
[junit] Testcase: org.vufind.solr.handler.tests.BrowseHandlerTest: Caused 
an ERROR
[junit] SolrCore 'biblio' is not available due to init failure: JVM Error 
creating core [biblio]: null
[junit] org.apache.solr.common.SolrException: SolrCore 'biblio' is not 
available due to init failure: JVM Error creating core [biblio]: null
[junit]  at 
org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1066)
[junit]  at 
org.vufind.solr.handler.tests.BrowseHandlerTest.prepareClass(BrowseHandlerTest.java:45)
[junit] Caused by: org.apache.solr.common.SolrException: JVM Error creating 
core [biblio]: null
[junit]  at 
org.apache.solr.core.CoreContainer.create(CoreContainer.java:833)
[junit]  at 
org.apache.solr.core.CoreContainer.access$000(CoreContainer.java:87)
[junit]  at 
org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:467)
[junit]  at 
org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:458)
[junit]  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[junit]  at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$1.run(ExecutorUtil.java:231)
[junit]  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[junit]  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[junit]  at java.lang.Thread.run(Thread.java:745)
[junit] Caused by: java.lang.ExceptionInInitializerError
[junit]  at 
org.apache.lucene.analysis.icu.segmentation.ICUTokenizerFactory.inform(ICUTokenizerFactory.java:107)
[junit]  at 
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:721)
[junit]  at org.apache.solr.schema.IndexSchema.(IndexSchema.java:160)
[junit]  at 
org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:56)
[junit]  at 
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:70)
[junit]  at 
org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:108)
[junit]  at 
org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:79)
[junit]  at 
org.apache.solr.core.CoreContainer.create(CoreContainer.java:812)
[junit] Caused by: java.lang.RuntimeException: java.io.IOException: Break 
Iterator Rule Data Magic Number Incorrect, or unsupported data version.
[junit]  at 

RE: Solr ssl issue while creating collection

2017-12-15 Thread Sundaram, Dinesh
Thanks again for your valuable reply. Yes that’s correct. Is there a way to 
start solr alone without any embedded/external zookeeper in solrcloud mode?


Dinesh Sundaram
MBS Platform Engineering

Mastercard



-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Wednesday, December 13, 2017 4:54 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr ssl issue while creating collection

On 12/13/2017 3:16 PM, Sundaram, Dinesh wrote:
> Thanks Shawn for your input, Is this errors specific only for zookeeper 
> operations? If so is there any way to turn off default zookeeper which runs 
> on 9983?

If you don't want to start the embedded zookeeper, then you want to be sure 
that you have a zkHost defined which lists all of the hosts in your external 
ensemble.  You can either define ZK_HOST in the include script, or use the -z 
option when starting Solr manually.  When Solr is provided with information 
about ZK hosts, it does NOT start the embedded ZK.

The exceptions you're seeing have nothing to do with zookeeper.  The latest 
exception you mentioned is caused by one SolrCloud instance sending HTTPS 
requests to another SolrCloud instance, and failing to validate SSL because the 
hostname doesn't match the info in the certificate.

Thanks,
Shawn


CONFIDENTIALITY NOTICE This e-mail message and any attachments are only for the 
use of the intended recipient and may contain information that is privileged, 
confidential or exempt from disclosure under applicable law. If you are not the 
intended recipient, any disclosure, distribution or other use of this e-mail 
message or attachments is prohibited. If you have received this e-mail message 
in error, please delete and notify the sender immediately. Thank you.


using rank queries(rq) with grouping in solr cloud

2017-12-15 Thread tomerg
hey,

i'm using solr 6.5.1 with solrCloud mode.
i use grouping for my results. 
i want to use rank query(rq) in order to rerank the top groups(with ltr).
it's ok for me to rerank the groups  only by reranking one of the documents
in the group. 
i saw in issue SOLR-8776 that  rank queries doesn't  support  grouping. 
(link here: https://issues.apache.org/jira/browse/SOLR-8776).

so i have a few questions:
1. there is some way to bypass this problem(or use some other existing
features of solr to achieve similar results? 
2. if there is no other way, i would like  to implement a component to
achieve this functionality(i don't want to patch the code of solr itself). 
do you have a suggestion what might be the best way to implement a rerank of
groups in cloud mode? 
can i implement something that rerank the groups for every shard before
merging  or there is a way to create component that rerank only the merged
result list from the shards? 

thanks,
tomerg



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: indexing data to solrcloud with "implicit" is not distributing across cluster.

2017-12-15 Thread Erick Erickson
You're misinterpreting the docs. _route_ is used to
tell _queries_ where to go, or to route a document
as part of the parameters when you send the doc,
not a field in the doc.

So when you added the _route_ field to the doc, you
didn't have it in the schema in the first place.

So you could add a _route_ field to your schema
and work that way, but then you have to also define
router.field=_route_ when you create the colleciton.
I'd advise instead just specifying router.field=Status
to avoid confusion.

Now, that said I really question whether this is a good
way to set up your collection. I'd just use compositeId
and when you want to restrict searches to one type
or the other add
=Status:Active
or
=Status:Terminated

that way you can't forget to delete the doc from one
shard or the other when the status changes. You won't
have lopsided doc counts on your shards because you
have 10,000,000 active docs and 10 terminated docs.
And whatever ratio you start with, it'll change as the
collection ages.

FWIW,
Erick

On Fri, Dec 15, 2017 at 11:17 AM, hemanth  wrote:
> I created a collection with implicit routing mechanism and my shared names
> are Active and Disabled , these are the values of one of my collection
> field: Status.  But when I am trying to upload the document using Solr UI
> documents section : Upload using JSON format with all the fields including
> field with value for Status as either Terminated or Active. It is going to
> only one default shard. I tried to insert _route_ field with the value as
> "Terminated" and when I try to insert the document , I am getting
>
> *unknown field '_route_' Error from server*. Am I trying in correct way?
> Does the implicit routing works on the hash value of routing field and it
> does not go to the shard based on the value of the routing field?
>
> I want to store the document with status field value : Active to
> myCollectionn_Active shard and document with status field value: Terminated
> to myCollection_Terminated shard automatically based on the value of my
> status field in the document. I used implicit routing while creating
> collection and given shard names as Active,Terminated. Plz help. I am using
> Solr 6.6 version.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: indexing data to solrcloud with "implicit" is not distributing across cluster.

2017-12-15 Thread hemanth
I created a collection with implicit routing mechanism and my shared names
are Active and Disabled , these are the values of one of my collection
field: Status.  But when I am trying to upload the document using Solr UI
documents section : Upload using JSON format with all the fields including
field with value for Status as either Terminated or Active. It is going to
only one default shard. I tried to insert _route_ field with the value as
"Terminated" and when I try to insert the document , I am getting  

*unknown field '_route_' Error from server*. Am I trying in correct way?
Does the implicit routing works on the hash value of routing field and it
does not go to the shard based on the value of the routing field? 

I want to store the document with status field value : Active to
myCollectionn_Active shard and document with status field value: Terminated
to myCollection_Terminated shard automatically based on the value of my
status field in the document. I used implicit routing while creating
collection and given shard names as Active,Terminated. Plz help. I am using
Solr 6.6 version.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr 6.6 using swap space causes recovery?

2017-12-15 Thread Erick Erickson
One mechanism that comes to mind is if the swapping slows down an update.

Here's the process
- Leader sends doc to follower
- follower times out
- leader says "that replica must be sick, I'll tell it to recover"

The smoking gun here is if you see any messages about
"leader-initiated recovery". grep for leader and initiated, because
some of the text is inconsistent, may be "leader initiated" or
"leader-initiated". And grep on both leader and follower.

On the leader you should also see messages in the log about the update
timing out.

The follower won't have any errors at all; the doc indexed
successfully it just took a long time.

This is a total shot in the dark BTW. I'm rather surprised that it
takes 3 or more hours for 200K docs. If they're really big and you're
sending them in batches you may just be taking a while to process the
batch and swapping may have nothing to do with it. Perhaps smaller
batches would help if that's the case.

Long GC pauses can also cause this to happen BTW, ZooKeeper will
periodically ping the node to see if it's up. If it times out ZK can
cause the node to go into recovery (actually, I think the leader gets
a message and puts the follower into recovery). Examining the GC logs
should tell you whether that's possible. 3-4 second stop-the-world GC
pauses (and those are excessive IMO) shouldn't hurt. Although with
heaps that small I wouldn't expect much in the way of stop-the-world
GC pauses.

Best,
Erick

On Fri, Dec 15, 2017 at 10:29 AM, Shawn Heisey  wrote:
> On 12/15/2017 10:53 AM, Bill Oconnor wrote:
>> The recovering server has a much larger swap usage than the other servers in 
>> the cluster. We think this this related to the mmap files used for indexes. 
>> The server eventually recovers but it triggers alerts for devops which are 
>> annoying.
>>
>> I have found a previous mail  list question (Shawn responded to) with almost 
>> an identical problem from 2014 but there is no suggested remedy. ( 
>> http://lucene.472066.n3.nabble.com/Solr-4-3-1-memory-swapping-td4126641.html)
>
> Solr itself cannot influence swap usage.  This is handled by the
> operating system.  I have no idea whether Java can influence swap usage,
> but if it can, this is also outside our control and I have no idea what
> to tell you.  My guess is that Java is unlikely to influence swap usage,
> but only a developer on the JDK team could tell you that for sure.
>
> Assuming we're dealing with a Linux machine, my recommendation would be
> to set vm.swappiness to 0 or 1, so that the OS is not aggressively
> deciding that data should be swapped.  The default for vm.swappiness on
> Linux is 60, which is quite aggressive.
>
>> Questions :
>>
>> Is there progress regarding this?
>
> As mentioned previously, there's nothing Solr or Lucene can do to help,
> because it's software completely outside our ability to influence.  The
> operating system will make those decisions.
>
> If your software tries to use more memory than the machine has, then
> swap is going to get used no matter how the OS is configured, and when
> that happens, performance will suffer greatly.  In the case of
> SolrCloud, it would make basic operation go so slowly that timeouts
> would get exceeded, and Solr would initiate recovery.
>
> If the OS is Linux, I would like to see a screenshot from the "top"
> program.  (not htop, or anything else, the program needs to be top).
> Run the program, press shift-M to sort the list by memory, and grab a
> screenshot.
>
> Thanks,
> Shawn
>


Re: How to restart solr in docker?

2017-12-15 Thread Deepak Vohra
Is the Docker container still running, which may be listed with docker ps?

Solr may be started with:

docker exec -it my_solr start




On Fri, 12/15/17, Buckler, Christine  wrote:

 Subject: How to restart solr in docker?
 To: "solr-user@lucene.apache.org" 
 Received: Friday, December 15, 2017, 8:55 AM
 
 What is the command for restarting solr on a
 docker image? I have modified the solrconfig.xml to add the
 suggest plugin and now I want to update to reflect this
 change. I was able to stop solr using “$ docker exec -it
 my_solr stop all” but now I can’t figure out how to
 restart.
 
 Thanks,
 Christine Buckler
 
 


Re: SolrCloud

2017-12-15 Thread John Davis
Thanks Erick. I agree SolrCloud is better than master/slave, however we
have some questions between managing replicas separately vs with solrcloud.
For eg how much overhead do SolrCloud nodes have wrt memory/cpu/disk in
order to be able to sync pending index updates to other replicas? What
monitoring and safeguards are in place out of the box so too many pending
updates for unreachable replicas don't make the alive ones fall over? Or a
new replica doesn't overwhelm existing replica.

Of course everything works great when things are running well but when
things go south our preference would be for solr to not fall over as first
priority.

On Fri, Dec 15, 2017 at 9:41 AM, Erick Erickson 
wrote:

> The main advantage in SolrCloud in your setup is HA/DR. You say you
> have multiple replicas and shards. Either you have to index to each
> replica separately or you use master/slave replication. In either case
> you have to manage and fix the case where some node goes down. If
> you're using master/slave, if the master goes down you need to get in
> there and fix it, reassign the master, make config changes, restart
> Solr to pick them up, make sure you pick up any missed updates and all
> that.
>
> in SolrCloud that is managed for you. Plus, let's say you want to
> increase QPS capacity. In SolrCloud all you do is use the collections
> API ADDREPLICA command and you're done. It gets created (and you can
> specify exactly what node if you want), the index gets copied, new
> updates are automatically routed to it and it starts serving requests
> when it's synchronized all automagically. Symmetrically you can
> DELETEREPLICA if you have too much capacity.
>
> The price here is you have to get comfortable with maintaining
> ZooKeeper admittedly.
>
> Also in the 7x world you have different types of replicas, TLOG, PULL
> and NRT that combine some of the features of master/slave with
> SolrCloud.
>
> Generally my rule of thumb is the minute you get beyond a single shard
> you should move to SolrCloud. If all your data fits in one Solr core
> then it's less clear-cut, master/slave can work just fine. It Depends
> (tm) of course.
>
> Your use case is "implicit" (being renamed "manual") routing when you
> create your Solr collection. There are pros and cons here, but that's
> beyond the scope of your question. Your infrastructure should port
> pretty directly to SolrCloud. The short form is that all your indexing
> and/or querying is happening on a single node when using manual
> routing rather than in parallel. Of course executing parallel
> sub-queries imposes its own overhead.
>
> If your use-case for having these on a single shard it to segregate
> the data by some set (say users), you might want to consider just
> using separate _collections_ in SolrCloud where old_shard ==
> new_collection, basically all your routing is the same. You can create
> aliases pointing to multiple collections or specify multiple
> collections on the query, don't know if that fits your use case or not
> though.
>
>
> Best,
> Erick
>
> On Fri, Dec 15, 2017 at 9:03 AM, John Davis 
> wrote:
> > Hello,
> > We are thinking about migrating to SolrCloud. Our current setup is:
> > 1. Multiple replicas and shards.
> > 2. Each query typically hits a single shard only.
> > 3. We have an external system that assigns a document to a shard based on
> > it's origin and is also used by solr clients when querying to find the
> > correct shard to query.
> >
> > It looks like the biggest advantage of SolrCloud is #3 - to route
> document
> > to the correct shard & replicas when indexing and to route query
> similarly.
> > Given we already have a fairly reliable system to do this, are there
> other
> > benefits from migrating to SolrCloud?
> >
> > Thanks,
> > John
>


Re: legacy replication

2017-12-15 Thread David Hastings
Understandable.  Right now we have a large set up of solr 5.x servers that
has been doing great for years.  But the time to upgrade has come, with
some things that we want that are not available in the 5.x branch.  I
really like legacy ( master/slave) replication, for the reasons you stated,
but also the fact that the cloud set up seems perfect, if you have a
handful of cheap machines around.  Our production set up has 1 indexer,
which has a 5 minute polling slave, and on releases we have 3 searching
servers that poll manually.   Tjhing is, these machines have over 32 cores
and over 200gb of ram with 2TB SSDs, each, these were not cheap and are
pretty fast with standalone solr.  Also the complexity of adding another 3
or more machines just to do nothing but ZK stuff was getting out of hand.
if its not broken, im not about to fix it

In any case im glad to hear legacy replication will stay.
Thanks,
-Dave

On Fri, Dec 15, 2017 at 1:15 PM, Walter Underwood 
wrote:

> I love legacy replication. It is simple and bulletproof. Loose coupling
> for the win! We only run Solr Cloud when we need sharding or NRT search.
> Loose coupling is a very, very good thing in distributed systems.
>
> Adding a replica (new slave) is trivial. Clone an existing one. This makes
> horizontal scaling so easy. We still haven’t written the procedure and
> scripts for scaling our Solr Cloud cluster. Last time, it was 100% manual
> through the admin UI.
>
> Setting up a Zookeeper ensemble isn’t as easy as it should be. We tried to
> set up a five node ensemble with ZK 3.4.6 and finally gave up after two
> weeks because it was blocking the release. We are using the three node
> 3.4.5 ensemble that had been set up for something else a couple of years
> earlier. I’ve had root on Unix since 1981 and have been running TCP/IP
> since 1983, so I should have been able to figure this out.
>
> We’ve had some serious prod problems with the Solr Cloud cluster, like
> cores stuck in a permanent recovery loop. I finally manually deleted that
> core and created a new one. Ugly.
>
> Even starting Solr Cloud processes is confusing. It took a while to figure
> out they were all joining as the same host (no, I don’t know why), so now
> we start them as: solr start -cloud -h `hostname`
>
> Keeping configs under source control and deploying them isn’t easy. I’m
> not going to install Solr on the Jenkins executor just so it can deploy,
> that is weird and kind of a chicken and egg thing. I ended up writing a
> Python program to get the ZK address from the cluster, use kazoo to load
> directly to ZK, then tell the cluster to reload. Both with that and with
> the provided ZK tools I ran into so much undocumented stuff. What is
> linking? How to the file config directories map to the ZK config
> directories? And so on.
>
> The lack of a thread pool for requests is a very serious problem. If our
> 6.5.1 cluster gets overloaded, it creates 4000 threads, runs out of memory
> and fails. That is just wrong. With earlier versions of Solr, it would get
> slower and slower, but recover gracefully.
>
> Converting a slave into a master is easy. We use this in the config file:
>
>
>   ${enable.master:false}
>   …
>   
>  ${textbooks.enable.slave:false}
>
> And this at startup (slave config shown): -Denable.master=false
> -Denable.slave=true
>
> Change the properties and restart.
>
> Our 6.5.1 cluster is faster than the non-sharded 4.10.4 master/slave
> cluster, but I’m not happy with the stability in prod. We’ve had more
> search outages in the past six months than we had in the previous four
> years. I’ve had Solr in prod since version 1.2, and this is the first time
> it has really embarrassed me.
>
> There are good things. Search is faster, we’re handling double the query
> volume with 3X the docs.
>
> Sorry for the rant, but it has not been a good fall semester for our
> students (customers).
>
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Dec 15, 2017, at 9:46 AM, Erick Erickson 
> wrote:
> >
> > There's pretty much zero chance that it'll go away, too much current
> > and ongoing functionality that depends on it.
> >
> > 1> old-style replication has always been used for "full sync" in
> > SolrCloud when peer sync can't be done.
> >
> > 2> The new TLOG and PULL replica types are a marriage of old-style
> > master/slave and SolrCloud. In particular a PULL replica is
> > essentially an old-style slave. A TLOG replica is an old-style slave
> > that also maintains a transaction log so it can take over leadership
> > if necessary.
> >
> > Best,
> > Erick
> >
> > On Fri, Dec 15, 2017 at 8:56 AM, David Hastings
> >  wrote:
> >> So i dont step on the other thread, I want to be assured whether or not
> >> legacy master/slave/repeater replication will continue to be supported
> in
> >> future solr versions.  our infrastructure is set up 

Re: Solr 6.6 using swap space causes recovery?

2017-12-15 Thread Shawn Heisey
On 12/15/2017 10:53 AM, Bill Oconnor wrote:
> The recovering server has a much larger swap usage than the other servers in 
> the cluster. We think this this related to the mmap files used for indexes. 
> The server eventually recovers but it triggers alerts for devops which are 
> annoying.
>
> I have found a previous mail  list question (Shawn responded to) with almost 
> an identical problem from 2014 but there is no suggested remedy. ( 
> http://lucene.472066.n3.nabble.com/Solr-4-3-1-memory-swapping-td4126641.html)

Solr itself cannot influence swap usage.  This is handled by the
operating system.  I have no idea whether Java can influence swap usage,
but if it can, this is also outside our control and I have no idea what
to tell you.  My guess is that Java is unlikely to influence swap usage,
but only a developer on the JDK team could tell you that for sure.

Assuming we're dealing with a Linux machine, my recommendation would be
to set vm.swappiness to 0 or 1, so that the OS is not aggressively
deciding that data should be swapped.  The default for vm.swappiness on
Linux is 60, which is quite aggressive.

> Questions :
>
> Is there progress regarding this?

As mentioned previously, there's nothing Solr or Lucene can do to help,
because it's software completely outside our ability to influence.  The
operating system will make those decisions.

If your software tries to use more memory than the machine has, then
swap is going to get used no matter how the OS is configured, and when
that happens, performance will suffer greatly.  In the case of
SolrCloud, it would make basic operation go so slowly that timeouts
would get exceeded, and Solr would initiate recovery.

If the OS is Linux, I would like to see a screenshot from the "top"
program.  (not htop, or anything else, the program needs to be top). 
Run the program, press shift-M to sort the list by memory, and grab a
screenshot.

Thanks,
Shawn



Re: legacy replication

2017-12-15 Thread Walter Underwood
I love legacy replication. It is simple and bulletproof. Loose coupling for the 
win! We only run Solr Cloud when we need sharding or NRT search. Loose coupling 
is a very, very good thing in distributed systems.

Adding a replica (new slave) is trivial. Clone an existing one. This makes 
horizontal scaling so easy. We still haven’t written the procedure and scripts 
for scaling our Solr Cloud cluster. Last time, it was 100% manual through the 
admin UI.

Setting up a Zookeeper ensemble isn’t as easy as it should be. We tried to set 
up a five node ensemble with ZK 3.4.6 and finally gave up after two weeks 
because it was blocking the release. We are using the three node 3.4.5 ensemble 
that had been set up for something else a couple of years earlier. I’ve had 
root on Unix since 1981 and have been running TCP/IP since 1983, so I should 
have been able to figure this out.

We’ve had some serious prod problems with the Solr Cloud cluster, like cores 
stuck in a permanent recovery loop. I finally manually deleted that core and 
created a new one. Ugly.

Even starting Solr Cloud processes is confusing. It took a while to figure out 
they were all joining as the same host (no, I don’t know why), so now we start 
them as: solr start -cloud -h `hostname`

Keeping configs under source control and deploying them isn’t easy. I’m not 
going to install Solr on the Jenkins executor just so it can deploy, that is 
weird and kind of a chicken and egg thing. I ended up writing a Python program 
to get the ZK address from the cluster, use kazoo to load directly to ZK, then 
tell the cluster to reload. Both with that and with the provided ZK tools I ran 
into so much undocumented stuff. What is linking? How to the file config 
directories map to the ZK config directories? And so on.

The lack of a thread pool for requests is a very serious problem. If our 6.5.1 
cluster gets overloaded, it creates 4000 threads, runs out of memory and fails. 
That is just wrong. With earlier versions of Solr, it would get slower and 
slower, but recover gracefully.

Converting a slave into a master is easy. We use this in the config file:

   
  ${enable.master:false}
  …
  
 ${textbooks.enable.slave:false}

And this at startup (slave config shown): -Denable.master=false 
-Denable.slave=true

Change the properties and restart.

Our 6.5.1 cluster is faster than the non-sharded 4.10.4 master/slave cluster, 
but I’m not happy with the stability in prod. We’ve had more search outages in 
the past six months than we had in the previous four years. I’ve had Solr in 
prod since version 1.2, and this is the first time it has really embarrassed me.

There are good things. Search is faster, we’re handling double the query volume 
with 3X the docs.

Sorry for the rant, but it has not been a good fall semester for our students 
(customers).

Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Dec 15, 2017, at 9:46 AM, Erick Erickson  wrote:
> 
> There's pretty much zero chance that it'll go away, too much current
> and ongoing functionality that depends on it.
> 
> 1> old-style replication has always been used for "full sync" in
> SolrCloud when peer sync can't be done.
> 
> 2> The new TLOG and PULL replica types are a marriage of old-style
> master/slave and SolrCloud. In particular a PULL replica is
> essentially an old-style slave. A TLOG replica is an old-style slave
> that also maintains a transaction log so it can take over leadership
> if necessary.
> 
> Best,
> Erick
> 
> On Fri, Dec 15, 2017 at 8:56 AM, David Hastings
>  wrote:
>> So i dont step on the other thread, I want to be assured whether or not
>> legacy master/slave/repeater replication will continue to be supported in
>> future solr versions.  our infrastructure is set up for this and all the HA
>> redundancies that solrcloud provides we have already spend a lot of time
>> and resources with very expensive servers to handle solr in standalone
>> mode.
>> 
>> thanks.
>> -David



Re: How to sort on dates?

2017-12-15 Thread Shawn Heisey
On 12/15/2017 2:53 AM, Georgios Petasis wrote:
> I have a field of type "date_range" defined as:
>
>  multiValued="false" indexed="true" stored="true"/>
>
> The problem is that sorting on this field does not work (despite the
> fact that I put dates in there). Instead I get an error prompting to
> perform sorting through a query.

Stating what Michael said in a different way:  Entries in a
DateRangeField can be a date range, not just a single timestamp.  How
would you decide what specific date to use in a sort?  The start of the
range?  The end of the range?  The middle of the range?  Any option that
the developers chose would be wrong for somebody, and it's not a
straightforward thing to make that choice configurable.

Michael suggested DatePointField.  That should work, because this type
holds a single timestamp, not a range.

The reason that DateRangeField is deprecated is that it uses a legacy
Lucene class that will no longer be available in 8.0.  Coming up with a
replacement is one of the many things that must be addressed before the
8.0 release.

Thanks,
Shawn



Solr 6.6 using swap space causes recovery?

2017-12-15 Thread Bill Oconnor
Hello,


We recently upgraded to SolrCloud 6.6. We are running on Ubuntu servers LTS 
14.x - VMware on Nutanics boxs. We have 4 nodes with 32GB each and 16GB for the 
jvm with 12GB minimum. Usually it is only using 4-7GB.


We do nightly indexing of partial fields for all our docs ~200K. This usually 
takes 3hr using 10 threads. About every other week we have a server go into 
recovery mode during the update. The recovering server has a much larger swap 
usage than the other servers in the cluster. We think this this related to the 
mmap files used for indexes. The server eventually recovers but it triggers 
alerts for devops which are annoying.


I have found a previous mail  list question (Shawn responded to) with almost an 
identical problem from 2014 but there is no suggested remedy. ( 
http://lucene.472066.n3.nabble.com/Solr-4-3-1-memory-swapping-td4126641.html)


Questions :


Is there progress regarding this?


Some kind of configuration that can mitigate this?


Maybe this is a lucene issue.


Thanks,

Bill OConnor (www.plos.org)


How to restart solr in docker?

2017-12-15 Thread Buckler, Christine
What is the command for restarting solr on a docker image? I have modified the 
solrconfig.xml to add the suggest plugin and now I want to update to reflect 
this change. I was able to stop solr using “$ docker exec -it my_solr stop all” 
but now I can’t figure out how to restart.

Thanks,
Christine Buckler


Re: Solr upgrade from 4.x to 7.1

2017-12-15 Thread Erick Erickson
What advantage do you see in TLOG and/or PULL replicas? The TLOG and
PULL replica types are for some pretty specific use cases,
particularly high-indexing-throughput cases where you can't afford to
index each doc on every node in your cluster. If you can afford the
CPU cycles to index on every node, I'd stick with NRT replicas.

Yes, though, you can mix/match the types. I question whether you
_want_ to, but you know your app best.

Best,
Erick

On Fri, Dec 15, 2017 at 8:44 AM, Drooy Drooy  wrote:
> Hi Erick/Robi,
>
> Thanks for your replies! one more question, if I go with solrcloud and
> having PULL/TLOG replication mixed in the cluster, by the documentation,
> this would have Master/Slave mode benefit as well,  does that seem feasible
> to you? are there any projects adopting this new feature in 7.0/7.1 ?
>
> Thanks
> Drooy
>
>
> On Thu, Dec 14, 2017 at 7:15 PM, Erick Erickson 
> wrote:
>
>> Completely agree with Robert. I'd also add that you should _not_ copy
>> your configs from 4x. Start with the 7x configs and add any
>> customizations but don't change things like luceneMatchVersion and the
>> like.
>>
>> If you simply _cannot_ reindex, take a look at
>> https://github.com/cominvent/solr-tools/blob/master/
>> upgradeindex/upgradeindex.sh
>>
>> On Thu, Dec 14, 2017 at 2:24 PM, Petersen, Robert (Contr)
>>  wrote:
>> > From what I have read, you can only upgrade to the next major version
>> number without using a tool to convert the indexes to the newer version.
>> But that is still perilous due to deprications etc
>> >
>> >
>> > So I think best advice out there is to spin up a new farm on 7.1
>> (especially from 4.x), make a new collection there, reindex everything into
>> it and then switch over to the new farm. I would also ask the question are
>> you thinking to go to master/slave on 7.1? Wouldn't you want to go with
>> solr cloud?
>> >
>> >
>> > I started with master/slave and yes it is simpler but there is that one
>> single point of failure (the master) for indexing, which is of course
>> easily manually overcome by purposing a slave as the new master and
>> repointing the remaining slaves at the new master however this is a
>> completely manual process you try to avoid in cloud mode.
>> >
>> >
>> > I think you'd need to think this through more fully with the new
>> possibilities available and how you'd want to migrate given your existing
>> environment is so far behind.
>> >
>> >
>> > Thanks
>> >
>> > Robi
>> >
>> > 
>> > From: Drooy Drooy 
>> > Sent: Thursday, December 14, 2017 1:27:53 PM
>> > To: solr-user@lucene.apache.org
>> > Subject: Solr upgrade from 4.x to 7.1
>> >
>> > Hi All,
>> >
>> > We have an in-house project running in Solr 4.7 with Master/Slave mode
>> for
>> > a few years, what is it going to take to upgrade it to SolrCloud with
>> > TLOG/PULL replica mode ?
>> >
>> > I read the upgrade guides, none of them talking about the jump from 4.x
>> to
>> > 7.
>> >
>> > Thanks much
>> >
>> > 
>> >
>> > This communication is confidential. Frontier only sends and receives
>> email on the basis of the terms set out at http://www.frontier.com/email_
>> disclaimer.
>>


Re: legacy replication

2017-12-15 Thread Erick Erickson
There's pretty much zero chance that it'll go away, too much current
and ongoing functionality that depends on it.

1> old-style replication has always been used for "full sync" in
SolrCloud when peer sync can't be done.

2> The new TLOG and PULL replica types are a marriage of old-style
master/slave and SolrCloud. In particular a PULL replica is
essentially an old-style slave. A TLOG replica is an old-style slave
that also maintains a transaction log so it can take over leadership
if necessary.

Best,
Erick

On Fri, Dec 15, 2017 at 8:56 AM, David Hastings
 wrote:
> So i dont step on the other thread, I want to be assured whether or not
> legacy master/slave/repeater replication will continue to be supported in
> future solr versions.  our infrastructure is set up for this and all the HA
> redundancies that solrcloud provides we have already spend a lot of time
> and resources with very expensive servers to handle solr in standalone
> mode.
>
> thanks.
> -David


Re: SolrCloud

2017-12-15 Thread Erick Erickson
The main advantage in SolrCloud in your setup is HA/DR. You say you
have multiple replicas and shards. Either you have to index to each
replica separately or you use master/slave replication. In either case
you have to manage and fix the case where some node goes down. If
you're using master/slave, if the master goes down you need to get in
there and fix it, reassign the master, make config changes, restart
Solr to pick them up, make sure you pick up any missed updates and all
that.

in SolrCloud that is managed for you. Plus, let's say you want to
increase QPS capacity. In SolrCloud all you do is use the collections
API ADDREPLICA command and you're done. It gets created (and you can
specify exactly what node if you want), the index gets copied, new
updates are automatically routed to it and it starts serving requests
when it's synchronized all automagically. Symmetrically you can
DELETEREPLICA if you have too much capacity.

The price here is you have to get comfortable with maintaining
ZooKeeper admittedly.

Also in the 7x world you have different types of replicas, TLOG, PULL
and NRT that combine some of the features of master/slave with
SolrCloud.

Generally my rule of thumb is the minute you get beyond a single shard
you should move to SolrCloud. If all your data fits in one Solr core
then it's less clear-cut, master/slave can work just fine. It Depends
(tm) of course.

Your use case is "implicit" (being renamed "manual") routing when you
create your Solr collection. There are pros and cons here, but that's
beyond the scope of your question. Your infrastructure should port
pretty directly to SolrCloud. The short form is that all your indexing
and/or querying is happening on a single node when using manual
routing rather than in parallel. Of course executing parallel
sub-queries imposes its own overhead.

If your use-case for having these on a single shard it to segregate
the data by some set (say users), you might want to consider just
using separate _collections_ in SolrCloud where old_shard ==
new_collection, basically all your routing is the same. You can create
aliases pointing to multiple collections or specify multiple
collections on the query, don't know if that fits your use case or not
though.


Best,
Erick

On Fri, Dec 15, 2017 at 9:03 AM, John Davis  wrote:
> Hello,
> We are thinking about migrating to SolrCloud. Our current setup is:
> 1. Multiple replicas and shards.
> 2. Each query typically hits a single shard only.
> 3. We have an external system that assigns a document to a shard based on
> it's origin and is also used by solr clients when querying to find the
> correct shard to query.
>
> It looks like the biggest advantage of SolrCloud is #3 - to route document
> to the correct shard & replicas when indexing and to route query similarly.
> Given we already have a fairly reliable system to do this, are there other
> benefits from migrating to SolrCloud?
>
> Thanks,
> John


Re: How to restart solr in docker?

2017-12-15 Thread Shawn Heisey
On 12/15/2017 9:55 AM, Buckler, Christine wrote:
> What is the command for restarting solr on a docker image? I have modified 
> the solrconfig.xml to add the suggest plugin and now I want to update to 
> reflect this change. I was able to stop solr using “$ docker exec -it my_solr 
> stop all” but now I can’t figure out how to restart.

You would need to ask whoever created the docker image.  There are no
docker images produced by the Solr project.

Thanks,
Shawn



Re: How to restart solr in docker?

2017-12-15 Thread Jamie Jackson
The usual procedure with containers is to restart or recreate the container.

On Fri, Dec 15, 2017 at 11:55 AM, Buckler, Christine <
christine.buck...@nordstrom.com> wrote:

> What is the command for restarting solr on a docker image? I have modified
> the solrconfig.xml to add the suggest plugin and now I want to update to
> reflect this change. I was able to stop solr using “$ docker exec -it
> my_solr stop all” but now I can’t figure out how to restart.
>
> Thanks,
> Christine Buckler
>
>


SolrCloud

2017-12-15 Thread John Davis
Hello,
We are thinking about migrating to SolrCloud. Our current setup is:
1. Multiple replicas and shards.
2. Each query typically hits a single shard only.
3. We have an external system that assigns a document to a shard based on
it's origin and is also used by solr clients when querying to find the
correct shard to query.

It looks like the biggest advantage of SolrCloud is #3 - to route document
to the correct shard & replicas when indexing and to route query similarly.
Given we already have a fairly reliable system to do this, are there other
benefits from migrating to SolrCloud?

Thanks,
John


legacy replication

2017-12-15 Thread David Hastings
So i dont step on the other thread, I want to be assured whether or not
legacy master/slave/repeater replication will continue to be supported in
future solr versions.  our infrastructure is set up for this and all the HA
redundancies that solrcloud provides we have already spend a lot of time
and resources with very expensive servers to handle solr in standalone
mode.

thanks.
-David


How to restart solr in docker?

2017-12-15 Thread Buckler, Christine
What is the command for restarting solr on a docker image? I have modified the 
solrconfig.xml to add the suggest plugin and now I want to update to reflect 
this change. I was able to stop solr using “$ docker exec -it my_solr stop all” 
but now I can’t figure out how to restart.

Thanks,
Christine Buckler



Re: Solr upgrade from 4.x to 7.1

2017-12-15 Thread Drooy Drooy
Hi Erick/Robi,

Thanks for your replies! one more question, if I go with solrcloud and
having PULL/TLOG replication mixed in the cluster, by the documentation,
this would have Master/Slave mode benefit as well,  does that seem feasible
to you? are there any projects adopting this new feature in 7.0/7.1 ?

Thanks
Drooy


On Thu, Dec 14, 2017 at 7:15 PM, Erick Erickson 
wrote:

> Completely agree with Robert. I'd also add that you should _not_ copy
> your configs from 4x. Start with the 7x configs and add any
> customizations but don't change things like luceneMatchVersion and the
> like.
>
> If you simply _cannot_ reindex, take a look at
> https://github.com/cominvent/solr-tools/blob/master/
> upgradeindex/upgradeindex.sh
>
> On Thu, Dec 14, 2017 at 2:24 PM, Petersen, Robert (Contr)
>  wrote:
> > From what I have read, you can only upgrade to the next major version
> number without using a tool to convert the indexes to the newer version.
> But that is still perilous due to deprications etc
> >
> >
> > So I think best advice out there is to spin up a new farm on 7.1
> (especially from 4.x), make a new collection there, reindex everything into
> it and then switch over to the new farm. I would also ask the question are
> you thinking to go to master/slave on 7.1? Wouldn't you want to go with
> solr cloud?
> >
> >
> > I started with master/slave and yes it is simpler but there is that one
> single point of failure (the master) for indexing, which is of course
> easily manually overcome by purposing a slave as the new master and
> repointing the remaining slaves at the new master however this is a
> completely manual process you try to avoid in cloud mode.
> >
> >
> > I think you'd need to think this through more fully with the new
> possibilities available and how you'd want to migrate given your existing
> environment is so far behind.
> >
> >
> > Thanks
> >
> > Robi
> >
> > 
> > From: Drooy Drooy 
> > Sent: Thursday, December 14, 2017 1:27:53 PM
> > To: solr-user@lucene.apache.org
> > Subject: Solr upgrade from 4.x to 7.1
> >
> > Hi All,
> >
> > We have an in-house project running in Solr 4.7 with Master/Slave mode
> for
> > a few years, what is it going to take to upgrade it to SolrCloud with
> > TLOG/PULL replica mode ?
> >
> > I read the upgrade guides, none of them talking about the jump from 4.x
> to
> > 7.
> >
> > Thanks much
> >
> > 
> >
> > This communication is confidential. Frontier only sends and receives
> email on the basis of the terms set out at http://www.frontier.com/email_
> disclaimer.
>


Re: SOLR Rest API for monitoring

2017-12-15 Thread Shawn Heisey
On 12/14/2017 2:27 PM, Abhi Basu wrote:
> I am using CDH 5.13 with Solr 4.10. Trying to automate metrics gathering
> for JVM (CPU, RAM, Storage etc.) by calling the REST APIs described here ->
> https://lucene.apache.org/solr/guide/6_6/metrics-reporting.html.
>
> Are these not supported in my version of Solr? If not, what option do I
> have?

The metrics API was added in version 6.4.  The documentation page you
referenced is for version 6.6.

https://issues.apache.org/jira/browse/SOLR-9812

I think that if you check the PDF reference guide for the 4.10 version
of Solr, you will not find that information in it.

Much of what gets presented by the metrics API is available at the core
level in earlier versions.

http://server:port/solr//admin/mbeans?stats=true=json=true

There are other APIs that you can call for information.  There are a
number of global API endpoints like /solr/admin/info/system that I
*think* are there in 4.10, but I'm not sure.  If you check Plugins/Stats
in the admin UI (or the mbeans URL referenced earlier), you can should
be able to get a full list of every endpoint that is available at the
core level, but I am not aware of anything that lists the global handlers.

Thanks,
Shawn



Re: Solr upgrade from 4.x to 7.1

2017-12-15 Thread Shawn Heisey
On 12/14/2017 2:27 PM, Drooy Drooy wrote:
> We have an in-house project running in Solr 4.7 with Master/Slave mode for
> a few years, what is it going to take to upgrade it to SolrCloud with
> TLOG/PULL replica mode ?
>
> I read the upgrade guides, none of them talking about the jump from 4.x to
> 7.

I would strongly recommend that you treat the new version as an entirely
new deployment, not as something you're going to upgrade.  Build new
collections with configs designed from 7.x examples and reindex
completely from scratch.

Many configs that people are using with version 4 were actually designed
for earlier versions and contain things that won't even work with 5.x,
let alone 7.x.  Even if you have a config built specifically for 4.x, it
is still likely that it's not going to work at all in 7.x without at
least some minimal updates.

The number of changes you're probably going to WANT (in addition to
those that you NEED) will probably make your config incompatible with
your existing index data.  It is extremely likely that you will need to
reindex at some point.  Three major versions is an *enormous* change to
how Solr works.

Just in case you might need it, here's a wiki page about reindexing. 
You may already be aware of what that means:

https://wiki.apache.org/solr/HowToReindex

Thanks,
Shawn



Re: Is it safe to give users access to /admin/luke ?

2017-12-15 Thread Shawn Heisey
On 12/13/2017 11:51 PM, Solrmails wrote:
> Is it safe to give users access to /admin/luke ? I restricted access for 
> normal users and I also restrict acces per solr document(via a plugin). But 
> for some reasonse users need informations from /admin/luke.
> Can they destroy something or retrieve informations that they shouldn't have?

In general, end users should NEVER have direct access to Solr.  Only
trusted administrators and your application should have access.  I would
even put requests to the luke handler behind the application -- write
something for the front end that pulls the information they need and
provides it to them.

If you can guarantee that /solr//admin/luke is the ONLY thing they
can get to, then it might be pretty safe, although it still might be
possible for users to bombard it with requests and create a denial of
service situation for your search engine.  If you can actually *trust*
those who have this access, you're probably OK.

Thanks,
Shawn



SOLR nested dataimport issues

2017-12-15 Thread Triveni
I am trying to import a nested xml using URLdatasource. But indexing is not
happening.
XML:

ABC
1512016450886
XYZ


access
public


access12
public12




My data-config.xml:


  
  
http://abc:123/api/sample_api.xml;
processor="XPathEntityProcessor" 
forEach="/hash" >
 
  
   http://abc:123/api/sample_api.xml;
processor="XPathEntityProcessor" forEach="/hash/xyz/xyz"
transformer="script:f1">

   
   
   
  
 

I am seeing below message when indexing:
Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.
Requests: 2 , Fetched: 3 , Skipped: 0 , Processed: 0 
Started: less than a minute ago

And in solr.log file below error:
2017-12-15 09:38:53.254 WARN  (qtp223684-13) [   x:xml_data]
o.a.s.h.d.SolrWriter Error creating document : SolrInputDocument(fields:
[createdBy=XYZ, id=ABC, _version_=1586842286933671936, _root_=ABC],
children: [SolrInputDocument(fields: [attr.attrValue=public, attr.Id=1,
attr.attrName=access, _root_=ABC, _version_=1586842286933671936]),
SolrInputDocument(fields: [attr.attrValue=public12, attr.Id=2,
attr.attrName=access12, _root_=ABC, _version_=1586842286933671936])])
org.apache.solr.common.SolrException: [doc=null] missing required field: id
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:265)
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:107)



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Wildcard searches with special character gives zero result

2017-12-15 Thread Michael Kuhlmann
Solr does not analyze queries with wildcards in it. So, with ch*p-seq,
it will search for terms that start with ch and end with p-seq. Since
your indexer has analyzed all tokens before, only chip and seq are in
the index.

See
https://solr.pl/en/2010/12/20/wildcard-queries-and-how-solr-handles-them/
for example.

If you really need results for such queries, I suggest to have a
copyField which is unstemmed and only tokenized on whitespaces. If you
then detect a wildcard character in your query string, search on that
field instead of the others.

-Michael

Am 15.12.2017 um 11:59 schrieb Selvam Raman:
> I am using edismax query parser.
> 
> On Fri, Dec 15, 2017 at 10:37 AM, Selvam Raman  wrote:
> 
>> Solr version - 6.4.0
>>
>> "title_en":["Chip-seq"]
>>
>> When i fired query like below
>>
>> 1) chip-seq
>> 2) chi*
>>
>> it is giving expected result, for this case one result.
>>
>> But when i am searching with wildcard it produce zero result.
>> 1) ch*p-seq
>>
>>
>> if i use escape character in '-' it creates two terms rather than single
>> term.
>>
>> --
>> Selvam Raman
>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>
> 
> 
> 



Re: Wildcard searches with special character gives zero result

2017-12-15 Thread Selvam Raman
I am using edismax query parser.

On Fri, Dec 15, 2017 at 10:37 AM, Selvam Raman  wrote:

> Solr version - 6.4.0
>
> "title_en":["Chip-seq"]
>
> When i fired query like below
>
> 1) chip-seq
> 2) chi*
>
> it is giving expected result, for this case one result.
>
> But when i am searching with wildcard it produce zero result.
> 1) ch*p-seq
>
>
> if i use escape character in '-' it creates two terms rather than single
> term.
>
> --
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>



-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"


Re: No Live SolrServer available to handle this request

2017-12-15 Thread Selvam Raman
Hi Steve,

i have raised the jira ticket SOLR-11764
.
I am happy to work with you to solve this problem.

Thanks,
selvam R

On Thu, Dec 7, 2017 at 2:48 PM, Steve Rowe  wrote:

> Hi Selvam,
>
> This sounds like it may be a bug - could you please create a JIRA?  (See <
> https://wiki.apache.org/solr/HowToContribute#JIRA_tips_.
> 28our_issue.2Fbug_tracker.29> for more info.)
>
> Thanks,
>
> --
> Steve
> www.lucidworks.com
>
> > On Dec 6, 2017, at 9:56 PM, Selvam Raman  wrote:
> >
> > Yes. you are right. we are using preanalyzed field and that causing the
> > problem.
> > The actual problem is preanalyzed with highlight option. if i disable
> > highlight option it works fine. Please let me know if there is work
> around
> > to solve it.
> >
> > On Wed, Dec 6, 2017 at 10:19 PM, Erick Erickson  >
> > wrote:
> >
> >> This looks like you're using "pre analyzed fields" which have a very
> >> specific format. PreAnalyzedFields are actually pretty rarely used,
> >> did you enable them by mistake?
> >>
> >> On Tue, Dec 5, 2017 at 11:37 PM, Selvam Raman  wrote:
> >>> When i look at the solr logs i find the below exception
> >>>
> >>> Caused by: java.io.IOException: Invalid JSON type java.lang.String,
> >>> expected Map
> >>> at
> >>> org.apache.solr.schema.JsonPreAnalyzedParser.parse(
> >> JsonPreAnalyzedParser.java:86)
> >>> at
> >>> org.apache.solr.schema.PreAnalyzedField$PreAnalyzedTokenizer.
> >> decodeInput(PreAnalyzedField.java:345)
> >>> at
> >>> org.apache.solr.schema.PreAnalyzedField$PreAnalyzedTokenizer.access$
> >> 000(PreAnalyzedField.java:280)
> >>> at
> >>> org.apache.solr.schema.PreAnalyzedField$PreAnalyzedAnalyzer$1.
> >> setReader(PreAnalyzedField.java:375)
> >>> at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:202)
> >>> at
> >>> org.apache.lucene.search.uhighlight.AnalysisOffsetStrategy.
> tokenStream(
> >> AnalysisOffsetStrategy.java:58)
> >>> at
> >>> org.apache.lucene.search.uhighlight.MemoryIndexOffsetStrategy.
> >> getOffsetsEnums(MemoryIndexOffsetStrategy.java:106)
> >>> ... 37 more
> >>>
> >>>
> >>>
> >>> I am setting up lot of fields (fq, score, highlight,etc) then put it
> >> into
> >>> solrquery.
> >>>
> >>> On Wed, Dec 6, 2017 at 11:22 AM, Selvam Raman 
> wrote:
> >>>
>  When i am firing query it returns the doc as expected. (Example:
>  q=synthesis)
> 
>  I am facing the problem when i include wildcard character in the
> query.
>  (Example: q=synthesi*)
> 
> 
>  org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
>  Error from server at http://localhost:8983/solr/Metadata2:
>  org.apache.solr.client.solrj.SolrServerException:
> 
>  No live SolrServers available to handle this request:[/solr/Metadata2_
>  shard1_replica1,
>   solr/Metadata2_shard2_replica2,
>   solr/Metadata2_shard1_replica2]
> 
>  --
>  Selvam Raman
>  "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
> 
> >>>
> >>>
> >>>
> >>> --
> >>> Selvam Raman
> >>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
> >>
> >
> >
> >
> > --
> > Selvam Raman
> > "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>
>


-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"


Wildcard searches with special character gives zero result

2017-12-15 Thread Selvam Raman
Solr version - 6.4.0

"title_en":["Chip-seq"]

When i fired query like below

1) chip-seq
2) chi*

it is giving expected result, for this case one result.

But when i am searching with wildcard it produce zero result.
1) ch*p-seq


if i use escape character in '-' it creates two terms rather than single
term.

-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"


Re: How to sort on dates?

2017-12-15 Thread Michael Kuhlmann
Hi Georgios,

DateRangeField is a kind of SpatialField which is not sortable at all.

For sorting, use a DatePointField instead. It's not deprecated; the
deprecated class is TrieDateField.

Best,
Michael


Am 15.12.2017 um 10:53 schrieb Georgios Petasis:
> Hi all,
> 
> I have a field of type "date_range" defined as:
> 
>  multiValued="false" indexed="true" stored="true"/>
> 
> The problem is that sorting on this field does not work (despite the
> fact that I put dates in there). Instead I get an error prompting to
> perform sorting through a query.
> 
> How can I do that? There is no documentation that I could find, that
> shows an alternative.
> 
> Also, I think that I saw a warning somewhere, that DateRangeField is
> deprecated. But no alternative is suggested:
> 
> https://lucene.apache.org/solr/guide/7_1/working-with-dates.html
> 
> I am using solr 7.1.
> 
> George
> 



How to sort on dates?

2017-12-15 Thread Georgios Petasis

Hi all,

I have a field of type "date_range" defined as:

multiValued="false" indexed="true" stored="true"/>


The problem is that sorting on this field does not work (despite the 
fact that I put dates in there). Instead I get an error prompting to 
perform sorting through a query.


How can I do that? There is no documentation that I could find, that 
shows an alternative.


Also, I think that I saw a warning somewhere, that DateRangeField is 
deprecated. But no alternative is suggested:


https://lucene.apache.org/solr/guide/7_1/working-with-dates.html

I am using solr 7.1.

George