Re: Solr cluster doesn't recover from a ZK disconnect if collection.reload() was issued

2016-01-14 Thread Shalin Shekhar Mangar
Which version of Solr is this on? On Thu, Jan 14, 2016 at 4:10 PM, Gili Nachum wrote: > Clarificaiton: If we restart nodes after reloading collection and before > pausing, then recovery works fine. > > On Thu, Jan 14, 2016 at 12:08 PM, Gili Nachum

Re: degrades qtime in a 20million doc collection

2016-01-14 Thread Erick Erickson
Jack: I think that was for faceting? SOLR-8096 maybe? On Thu, Jan 14, 2016 at 12:25 AM, Toke Eskildsen wrote: > On Wed, 2016-01-13 at 15:01 -0700, Anria B. wrote: > > [256GB RAM] > >> 1. Collection has 20-30 million docs. > > Just for completeness: How large is the

Re: how many document contain a shard

2016-01-14 Thread Erick Erickson
You have to prototype. Fortunately you can do that on a very small cluster, say 2 shards. Here's the long form: https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Best, Erick On Thu, Jan 14, 2016 at 4:38 AM, Mugeesh Husain

Re: Setting of ramBufferSizeMB

2016-01-14 Thread Erick Erickson
Yep. Here's Mike's classic video: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html The third visualization down "TieredMergePolicy" is the default. Best, Erick On Wed, Jan 13, 2016 at 6:52 PM, Zheng Lin Edwin Yeo wrote: > Hi Erick, > > Thanks

Re: error in initializing solrconfig.xml

2016-01-14 Thread Erick Erickson
Tell us a lot more. What exact error are you seeing in the Solr log? On Wed, Jan 13, 2016 at 11:50 PM, Zap Org wrote: > i have 2 running solr nodes in my cluster one node hot down. i restarted > tomcat server and its throughing exception for initializing solrconfig.xml >

Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

2016-01-14 Thread Alessandro Benedetti
The issue linked by Erick is really interesting. Gia, to answer to your further question : For such scenario we need to plan the worst case, where everything is lost. > With Master Slave is just a matter of recreating machines, reconfigure the > core, and restore a backup, and the game is done,

Re: Can we create multiple cluster in single Zookeeper instance

2016-01-14 Thread Shawn Heisey
On 1/14/2016 10:22 AM, Mugeesh Husain wrote: > I have a question i want to create 2-3 cluster using solrlcoud using single > zookeeper instance, it is possible ? Yes, if you use a chroot on the zkHost parameter for each collection.

Can we create multiple cluster in single Zookeeper instance

2016-01-14 Thread Mugeesh Husain
hello, I have a question i want to create 2-3 cluster using solrlcoud using single zookeeper instance, it is possible ? -- View this message in context: http://lucene.472066.n3.nabble.com/Can-we-create-multiple-cluster-in-single-Zookeeper-instance-tp4250791.html Sent from the Solr - User

Re: indexing rich data with solr 5.3

2016-01-14 Thread Erik Hatcher
And also, bin/post can be your friend when it comes to troubleshooting or introspecting Tika parsing via /update/extract. Like this: $ bin/post -c test -params "extractOnly=true=ruby=yes" -out yes docs/SYSTEM_REQUIREMENTS.html java -classpath

Re: degrades qtime in a 20million doc collection

2016-01-14 Thread Anria B.
hi all, We did try the q=queryA AND queryB, vs q=queryA=queryB. For all tests, we commented out caching, and reload core between queries to be ultra sure that we are getting good comps on time. we have so many unique Fq and such frequent commits that caches are always invalidated, so our

Re: Can we create multiple cluster in single Zookeeper instance

2016-01-14 Thread Anria B.
hi Mugeesh It's best to use Zookeeper as it was intended. Install, or run 3 of them independent of any Solr, then point Solr to the zookeeper cluster. You can have 1, but then, if anything happens to that 1 single node of Zookeeper, all of your Solr will be dead, until you can properly revive

Re: degrades qtime in a 20million doc collection

2016-01-14 Thread Toke Eskildsen
On Wed, 2016-01-13 at 15:01 -0700, Anria B. wrote: [256GB RAM] > 1. Collection has 20-30 million docs. Just for completeness: How large is the collection in bytes? > 2. q=*=someField:SomeVal ---> takes 2.5 seconds > 3.q=someField:SomeVal --> 300ms > 4. as numFound -> infinity,

Re: how to add new node in sole cloud cluster

2016-01-14 Thread Binoy Dalal
1) point your new solr to the cloud's zookeeper using -DzkHost parameter. That's all. 2) what is the exact error? Stack trace? On Thu, 14 Jan 2016, 13:18 Zap Org wrote: > i have 2 nodes where one got down and after restarting the server it shows > error in initializing

Re: FieldCache

2016-01-14 Thread Toke Eskildsen
On Thu, 2016-01-14 at 00:18 +, Lewin Joy (TMS) wrote: > I am working on Solr 4.10.3 on Cloudera CDH 5.4.4 and am trying to > group results on a multivalued field, let's say "interests". ... > But after I just re-indexed the data, it started working. Grouping is not supposed to be supported

Re: It's possible up and debug solr in eclipse IDE?

2016-01-14 Thread Ramkumar R. Aiyengar
I should add to Erick's point that the test framework allows you to test HTTP APIs through an embedded Jetty instance, so you should be able to do anything that you do with a remote Solr instance from code.. On 12 Jan 2016 18:24, "Erick Erickson" wrote: > And a neater

Position increment in WordDelimiterFilter.

2016-01-14 Thread Modassar Ather
Hi, I have following definition for WordDelimiterFilter. The analysis of 3d shows following four tokens and their positions. token position 3d 1 3 1 3d 1 d 2 Please help me understand why d is at 2? Should not it also be at position

Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

2016-01-14 Thread Alessandro Benedetti
It's true that SolrCloud is adding some complexity. But few observations : SolrCloud has some disadvantages and can't beat the easiness and simpleness > of > Master Slave Replica. So I can only encourage to keep Master Slave Replica > in future versions. I agree, it can happen situations when

Solr cluster doesn't recover from a ZK disconnect if collection.reload() was issued

2016-01-14 Thread Gili Nachum
Hi, Our Solr cluster is running VMs that could freeze for more than the ZK tick time (it's a non critical CI/CD pipeline running on an overloaded ESX). When this happens the node's shards will be registered as down. Then when the node is back recovery takes place, and all shards replicas end up

Fwd: solr-5.3.1 admin console not show properly

2016-01-14 Thread David Cao
Hi there, I installed and started solr following instructions from solr wiki as this ... (on a Redhat server) cd ~/ tar zxf /tmp/solr-5.3.1.tgz cd solr-5.3.1/bin ./solr start -f Solr starts fine. But when opening console in a browser (" http://server-ip:8983/solr/admin.html;), it shows a

Re: Solr cluster doesn't recover from a ZK disconnect if collection.reload() was issued

2016-01-14 Thread Gili Nachum
Clarificaiton: If we restart nodes after reloading collection and before pausing, then recovery works fine. On Thu, Jan 14, 2016 at 12:08 PM, Gili Nachum wrote: > Hi, > > Our Solr cluster is running VMs that could freeze for more than the ZK > tick time (it's a non

Re: Position increment in WordDelimiterFilter.

2016-01-14 Thread Binoy Dalal
I've tried out your settings and here's what I get: 3d 1 3 1 d 2 3d 2 1) can you confirm if you've made a typo while typing out your results? 2 ) you'll get the d and 3d as 2 since they're the 2nd token once 3d is split. Try the same thing with d3 and you'll get 3 and d3 at position 2 On

Re: It's possible up and debug solr in eclipse IDE?

2016-01-14 Thread Vincenzo D'Amore
Few days ago I had a nullpointer exception with solr 5.4.0 few days ago. This was the exception. java.lang.NullPointerException at org.apache.solr.search.QParser.getParser(QParser.java:315) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:159) at

RE: Pro and cons of using Solr Cloud vs standard Master Slave Replica

2016-01-14 Thread Gian Maria Ricci - aka Alkampfer
I agree that SolrCloud has not only advantages, I really understand that it offers many more features, but it introduces some complexity. One of the problem I've found is that I've not found a simple way to backup the content of a collection to restore in situation of disaster recovery. With

Re: Position increment in WordDelimiterFilter.

2016-01-14 Thread Emir Arnautovic
Hi Modassar, Why do you think it should be at position 1? In that case searching for "3 d" would not find anything. Is it what you expect? Thanks, Emir On 14.01.2016 10:15, Modassar Ather wrote: Hi, I have following definition for WordDelimiterFilter. The analysis of 3d shows following

Classes in solr_home /lib cannot import from solr/dist

2016-01-14 Thread Callum Lamb
I've got an extension jar that contains a class which extends from org.apache.solr.handler.dataimport.DataSource But it only works if it's within the solr/dist folder. However when stored in the lib/ folder within Solr home. When it tries to load the class it cannot find it's parent: Exception

Re: Position increment in WordDelimiterFilter.

2016-01-14 Thread Binoy Dalal
Irrespective of it what I want to understand why there is an increment in position. Should not all the terms be at same position as they are yielded from the same term/token? No they won't. The positions are incremented because typically these splits are used in phrase queries which solr might

Re: How to configure authentication in windows start script?

2016-01-14 Thread Kristine Jetzke
> > In the Linux script is an option called AUTHC_CLIENT_CONFIGURER_ARG but I > don't find anything similar for Windows... > I just realized that this option is not used anyway for the status check during startup. Any ideas how a solution would look like to make the status check pass on Windows

how many document contain a shard

2016-01-14 Thread Mugeesh Husain
Hi, I have a bid amount of document(billion) , I am looking for how many shard i have to create in a core ? As i know capacity of core is 100M( aprox) ? Is i need to create another core and make distributed search(solrcloud) on it ? Actually i looking for a architecture how i should design my

Re: Position increment in WordDelimiterFilter.

2016-01-14 Thread Modassar Ather
Thanks for your responses. Why do you think it should be at position 1? In that case searching for "3 d" would not find anything. Is it what you expect? During search some of the results returned are not wanted. Following is the example. Search query: "3d image" Search results with 3-d image/3 d

Re: Position increment in WordDelimiterFilter.

2016-01-14 Thread Emir Arnautovic
Hi, It seems to me that you don't want to split on numbers. Maybe there are other cases where you need to so it is turned on. If there are such cases I would suggest you create test with expectations so you can check what is best working for you. It is highly likely that you will not be able

RE: Pro and cons of using Solr Cloud vs standard Master Slave Replica

2016-01-14 Thread Gian Maria Ricci - aka Alkampfer
Actually there are situation where a restore is needed, suppose that someone does some error and deletes all documents from a collection, or maybe deletes a series of document, etc. I know that this is not likely to happen, but in mission critical enterprise system, we always need a detailed

Re: Classes in solr_home /lib cannot import from solr/dist

2016-01-14 Thread sara hajili
hi Callum. you can create a directory for your jar file any where,and u must set jar file location in tag in solrConfig.xml and be carefull that add your lib location at the end of the solr config default tag, because some times your jar need class that at first solr must be load own class after

Re: How to configure authentication in windows start script?

2016-01-14 Thread Jan Høydahl
Hi, You can contribute to this JIRA issue: https://issues.apache.org/jira/browse/SOLR-8048 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 14. jan. 2016 kl. 13.02 skrev Kristine Jetzke > : > >> >> In the Linux script is an option

Monitor backup progress when location parameter is used.

2016-01-14 Thread Gian Maria Ricci - aka Alkampfer
If I start a backup operation using the location parameter http://localhost:8983/solr/mycore/replication?command=backup=mycore ation=z:\temp\backupmycore How can I monitor when the backup operation is finished? Issuing a standard details operation http://localhost:8983/solr/ mycore

Re: Pro and cons of using Solr Cloud vs standard Master Slave Replica

2016-01-14 Thread Erick Erickson
re: SolrCloud backup/restore: https://issues.apache.org/jira/browse/SOLR-5750 not committed yet, but getting attention. On Thu, Jan 14, 2016 at 6:19 AM, Gian Maria Ricci - aka Alkampfer wrote: > Actually there are situation where a restore is needed, suppose that

Re: Monitor backup progress when location parameter is used.

2016-01-14 Thread Jack Krupansky
I think the doc is wrong or at least misleading: https://cwiki.apache.org/confluence/display/solr/Making+and+Restoring+Backups+of+SolrCores "The backup operation can be monitored to see if it has completed by sending the details command to the /replication handler..." >From reading the code, it

Re: Position increment in WordDelimiterFilter.

2016-01-14 Thread Jack Krupansky
Which release of Solr are you using? Last year (or so) there was a Lucene change that had the effect of keeping all terms for WDF at the same position. There was also some discussion about whether this was either a bug or a bug fix, but I don't recall any resolution. -- Jack Krupansky On Thu,

Re: indexing rich data with solr 5.3

2016-01-14 Thread Erick Erickson
No good way except to try them. For getting details on Tika parsing failures, I much prefer the SolrJ process that the link I sent you outlines. Best, Erick On Thu, Jan 14, 2016 at 7:52 AM, kostali hassan wrote: > thank you Eric I have prb with this files; last

Re: Classes in solr_home /lib cannot import from solr/dist

2016-01-14 Thread Callum Lamb
That's what I did: My solrconfig.xml has the following (i've hardcoded the version numbers for now to get regexes out of the picture): No warning's whatsoever for not finding the jars. And the jars themselves are in the right order (the second depends on the first). If i move the data import

Re: Solr Query Tuning

2016-01-14 Thread Doug Turnbull
I suppose that /get is the query by id API. I wonder if its reasonable to expect it to be smart in SolrCloud usage. On Thursday, January 14, 2016, Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > Stupid thought/question. Is there a query by id API that understands > SolrCloud

Re: solr-5.3.1 admin console not show properly

2016-01-14 Thread Jan Høydahl
> Solr problem. You probably have some kind of system-level classpath > problem where the wrong version of a critical jar is being used instead > of the jar that's included with Jetty in the Solr download. Since our bin/solr script starts Jetty using java -jar, any CLASSPATH environment

Re: Solr Query Tuning

2016-01-14 Thread Shawn Heisey
On 1/14/2016 5:20 PM, Shivaji Dutta wrote: > I am working with a customer that has about a billion documents on 20 shards. > The documents are extremely small about 100 characters each. > The insert rate is pretty good, but they are trying to fetch the document by > using SolrJ SolrQuery > >

Re: Solr Query Tuning

2016-01-14 Thread Doug Turnbull
Stupid thought/question. Is there a query by id API that understands SolrCloud routing and can simply fwd the query to the shard that would hold said document? Barring that, can one use SolrJ's routing brains to see what shard a given id would be routed to and only query that shard? -Doug On

Re: Solr Query Tuning

2016-01-14 Thread Jack Krupansky
Sounds intriguing. It would have to know for sure which query parser is being used, which might be set in the server side defaults. Over in Cassandra NoSQL database land we have the concept of "token aware load balancing policy" on the client side that does the necessary magic (requiring parsing

Solr Query Tuning

2016-01-14 Thread Shivaji Dutta
Team Thanks for all the help before. Current State I am working with a customer that has about a billion documents on 20 shards. The documents are extremely small about 100 characters each. The insert rate is pretty good, but they are trying to fetch the document by using SolrJ SolrQuery

Re: Solr Query Tuning

2016-01-14 Thread Jack Krupansky
Add =all to your query to see where the time is spent in the "timing" section to see which Solr search component is consuming the time. You may also have to add =track to get the shard-specific info. In theory, 19 of the shards should return nothing and the 20th will return a single document.

Re: It's possible up and debug solr in eclipse IDE?

2016-01-14 Thread Shawn Heisey
On 1/14/2016 3:55 AM, Vincenzo D'Amore wrote: > Few days ago I had a nullpointer exception with solr 5.4.0 few days ago. > > This was the exception. > > java.lang.NullPointerException at > org.apache.solr.search.QParser.getParser(QParser.java:315) at >

Re: ConcurrentUpdateSolrClient vs CloudSolrClient for bulk update to SolrCloud

2016-01-14 Thread Shivaji Dutta
Thanks Erick. On 1/13/16, 10:55 AM, "Erick Erickson" wrote: >My first thought is "yes, you're overthinking it" ;) > >Here's something to get you started for indexing >through a Java program: >https://cwiki.apache.org/confluence/display/solr/Using+SolrJ > >Of course

Re: degrades qtime in a 20million doc collection

2016-01-14 Thread Anria B.
Here are some Actual examples, if it helps wt=json=*:*=on=SolrDocumentType:"invalidValue"=timestamp=0=0=timing { "responseHeader": { "status": 0, "QTime": 590, "params": { "q": "*:*", "debug": "timing", "indent": "on",

Re: Position increment in WordDelimiterFilter.

2016-01-14 Thread Modassar Ather
Thanks for your responses. It seems to me that you don't want to split on numbers. It is not with number only. Even if you try to analyze WiFi it will create 4 token one of which will be at position 2. So basically the issue is with position increment which causes few of the queries behave

Re: Solr cluster doesn't recover from a ZK disconnect if collection.reload() was issued

2016-01-14 Thread Gili Nachum
Opps. Got omitted. v4.72. plus it kept reproducing after upgrading to v4.9 (was trying to see if it was fixed later on). On Thu, Jan 14, 2016 at 5:26 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > Which version of Solr is this on? > > On Thu, Jan 14, 2016 at 4:10 PM, Gili Nachum

Re: It's possible up and debug solr in eclipse IDE?

2016-01-14 Thread Shawn Heisey
On 1/14/2016 5:24 PM, Shawn Heisey wrote: > That exception, especially given the lack of an error message, is very > unhelpful. The average person wouldn't be able to deduce that it was a > config problem. > > Perhaps the code in QParser that threw the NPE needs a null check, > logging/throwing a

Re: degrades qtime in a 20million doc collection

2016-01-14 Thread Anria B.
hi Shawn Thanks for your comprehensive answers. I really appreciate it. Just for clarity, the numbers I posted here were from tests that we isolated only one single fq and a q. These do have good times, even though its almost 600ms. Once we are in application mode, and other fq's and facets

Re: solr error

2016-01-14 Thread Shawn Heisey
On 1/14/2016 12:08 AM, Midas A wrote: > we are continuously getting the error > "null:org.eclipse.jetty.io.EofException" > on slave . > > what could be the reason ? This error is caused by clients that disconnect the HTTP/TCP connection before Solr has responded to a request. Jetty logs this

Re: degrades qtime in a 20million doc collection

2016-01-14 Thread Shawn Heisey
On 1/14/2016 12:07 PM, Anria B. wrote: > Here are some Actual examples, if it helps > > wt=json=*:*=on=SolrDocumentType:"invalidValue"=timestamp=0=0=timing > "QTime": 590, > Now we wipe out all caches, and put the filter in q. > >

Re: solr-5.3.1 admin console not show properly

2016-01-14 Thread Jan Høydahl
Very strange, a fresh install should run without issues. Perhaps Uwe Schindler can comment on any known bugs in your IBM J9? If I were you I’d try the following * Install Oracle Java 8 or OpenJDK 8 and set JAVA_HOME accordingly * Download Solr 5.4.0 * Unpack and start Solr as before -- Jan

Re: degrades qtime in a 20million doc collection

2016-01-14 Thread Anria B.
Here is a stacktrace of when we put a in the autowarming, or in the "newSearcher" to warm up the collection after a commit. 2016-01-12 19:00:13,216 [http-nio-19082-exec-25 vaultThreadId:http-STAGE-30518-14 vaultSessionId:1E53A095AD22704 vaultNodeId:nodeId:node-2 vaultInstanceId:2228

Re: degrades qtime in a 20million doc collection

2016-01-14 Thread Shawn Heisey
On 1/14/2016 1:01 PM, Anria B. wrote: > Here is a stacktrace of when we put a in the autowarming, or in the > "newSearcher" to warm up the collection after a commit. > org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: Error > opening new searcher. exceeded limit of

Re: solr-5.3.1 admin console not show properly

2016-01-14 Thread Shawn Heisey
On 1/14/2016 8:03 AM, David Cao wrote: > The JVM is from IBM based on jre 1.7. > > IBM J9 VM (build 2.6, JRE 1.7.0 Linux amd64-64 Compressed References > 20141216_227497 (JIT enabled, AOT enabled) > > > The box I am using is just a dev vm box, using 'root' is temporary ... The specific method

RE: FieldCache

2016-01-14 Thread Lewin Joy (TMS)
Hi Toke, Thanks for the reply. But, the grouping on multivalued is working for me even with multiple data in the multivalued field. I also tested this on the tutorial collection from the later solr version 5.3.1 , which works as well. Maybe the wiki needs to be updated? -Lewin -Original

Re: solr-5.3.1 admin console not show properly

2016-01-14 Thread David Cao
Hi Jan, The JVM is from IBM based on jre 1.7. IBM J9 VM (build 2.6, JRE 1.7.0 Linux amd64-64 Compressed References 20141216_227497 (JIT enabled, AOT enabled) The box I am using is just a dev vm box, using 'root' is temporary ... Thanks david On Thu, Jan 14, 2016 at 6:53 AM, David Cao

Fwd: indexing rich data with solr 5.3

2016-01-14 Thread kostali hassan
thank you Eric I have prb with this files; last question how to define or get the list of files cant be indexing or bad files. > > > >

Re: degrades qtime in a 20million doc collection

2016-01-14 Thread Jack Krupansky
That sounds like it. Sorry my memory is so hazy. Maybe Yonik can either confirm that that Jira is still outstanding or close it, and confirm if these symptoms are related. -- Jack Krupansky On Thu, Jan 14, 2016 at 10:54 AM, Erick Erickson wrote: > Jack: > > I think

Using available mappers in MapReduceIndexerTool

2016-01-14 Thread Douglas Rapp
Hi, I am using Solr 4.10.4, SolrCloud mode (single instance), with the indexes residing in HDFS. I am currently testing performance and scalability of the indexing process on my Hadoop cluster using the MapReduceIndexerTool. Previously, I had been testing on a smaller cluster with 3 datanodes.

Re: Classes in solr_home /lib cannot import from solr/dist

2016-01-14 Thread Shawn Heisey
On 1/14/2016 5:36 AM, Callum Lamb wrote: > I've got an extension jar that contains a class which extends from > > org.apache.solr.handler.dataimport.DataSource > > But it only works if it's within the solr/dist folder. However when stored > in the lib/ folder within Solr home. When it tries to