Zookeeper state and its effect on Solr cluster.

2015-07-27 Thread Modassar Ather
Hi, Kindly help me understand following with respect to Solr version 5.2.1. 1. What happens to the solr cluster if the standalone external zookeeper is stopped/restarted with some changes done in zoo_data during the restart? E.g After restarting the zookeeper the solr configs are reloaded

RE: caceh implemetation?

2015-07-27 Thread cbuxbaum
Hi Mikhail, Thanks for your help. There are many improvements in Solr 5.x that we will take advantage of once we migrate. For now we are on 4.6. Thanks, Carl Buxbaum Software Architect TradeStone Software 17 Rogers St. Suite 2; Gloucester, MA 01930 P: 978-515-5128 F : 978-281-0673

Re: Basic auth

2015-07-27 Thread Noble Paul
Q.do you know when it would be released? 5.3 will be released in another 3-4 weeks . Q.Are there any requirements of ZK authentication must be there as well? NO bq.Providing my own security.json + class/implementation to verify user/pass should work today with 5.2, right? Yes. But, if you

Re: term frequency with stemming

2015-07-27 Thread Aki Balogh
Hi Alessandro, I'm counting word frequencies on a site. All I want to do is, I want to count running and run as the same topic. It's not really fuzzy matching I believe -- i.e. I wouldn't want to match running and sprinting. I think stemming should be it.. seems to work fine now.. TY, Aki On

Large number of collections in SolrCloud

2015-07-27 Thread Olivier
Hi, I have a SolrCloud cluster with 3 nodes : 3 shards per node and replication factor at 3. The collections number is around 1000. All the collections use the same Zookeeper configuration. So when I create each collection, the ZK configuration is pulled from ZK and the configuration files are

Re: Solr Cloud: Duplicate documents in multiple shards

2015-07-27 Thread Erick Erickson
Hmmm, with that setup you should _not_ be getting duplicate documents. So, when you see duplicate documents, you're seeing the exact same UUID on two shards, correct? My best guess is that you've done something innocent-seeming (that perhaps you forgot!) the resulted in this. Otherwise there

Re: Large number of collections in SolrCloud

2015-07-27 Thread Shawn Heisey
On 7/27/2015 9:16 AM, Olivier wrote: I have a SolrCloud cluster with 3 nodes : 3 shards per node and replication factor at 3. The collections number is around 1000. All the collections use the same Zookeeper configuration. So when I create each collection, the ZK configuration is pulled from

Re: Zookeeper state and its effect on Solr cluster.

2015-07-27 Thread Shawn Heisey
On 7/27/2015 6:17 AM, Modassar Ather wrote: Kindly help me understand following with respect to Solr version 5.2.1. 1. What happens to the solr cluster if the standalone external zookeeper is stopped/restarted with some changes done in zoo_data during the restart? E.g After restarting

Re: Zookeeper state and its effect on Solr cluster.

2015-07-27 Thread Erick Erickson
Frankly, I do not know. I would reload all my collections as a preventative measure though. I'm no sure that this is a scenario that's been actively supported and therefore would not rely on rebooting Zookeeper to cause all my collections to reload (and thus get, say, any config changes).

Re: Large number of collections in SolrCloud

2015-07-27 Thread Erick Erickson
AFAIK, the shareSchema flag which shares the same internal schema object is _NOT_ honored in SolrCloud mode. Could you raise a JIRA to the effect of Investigate honoring shareSchema in Cloud mode? Please add in a note about the case you're seeing. Not promising to actually work on it, but it

Re: serious JSON Facet bug

2015-07-27 Thread Harry Yoo
yes, I see the problem on my production solr. I set 10,240 as max and I see the current size is 228,940. x22 bigger than max. On Jul 23, 2015, at 8:43 PM, Yonik Seeley ysee...@gmail.com wrote: On Thu, Jul 23, 2015 at 5:00 PM, Harry Yoo hyunat...@gmail.com wrote: Is there a way to patch? I

Re: Zookeeper state and its effect on Solr cluster.

2015-07-27 Thread Modassar Ather
Thanks for your response Erick and Shawn. We had automated the solr/zookeeper future upgrades using scripts. So for any new version of solr/zookeeper we use those script. While upgrading zookeeper we do stop it to install it as a service and then apply the new distribution(which is currently

Stemming Issue

2015-07-27 Thread EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS)
Hi , I am using the solr.KStemFilterFactory in my solr schema for a custom field type. When I use the interface (Solr) to Analysis the words. I am getting strange behavior. E.g. If Add the keyword Supplies I am not getting anything like Supply. Is this behavior is because of the Kstem, is

Dollar signs in field names

2015-07-27 Thread Thomas Seidl
Hi all, I've used dollar signs in field names for several years now, as an easy way to escape bad characters (like colons) coming in from the original source of the data, and I've never had any problems. Since I don't know of any Solr request parameters that use a dollar sign as a special

Rendering Solr JSON results from outside of Velocity

2015-07-27 Thread I2R
Hi people, I have to build an application that will use Velocity templates for searching and displaying of results. However, the searches must be first pre-processed and analyzed using an external webservice implemented in Python. This module is also in charge of searching the results and can

Re: Zookeeper state and its effect on Solr cluster.

2015-07-27 Thread Erick Erickson
Why are you doing this? It seems like you're making it _much_ more difficult than necessary. Sure, automate all the non-solr stuff, but why not make your scripts use the ZK upload/download process that's well established and tested for maintaining the Solr specific data? Best, Erick On Mon, Jul

Re: SolrJ/Tika custom indexer not indexing CERTAIN .doc text?

2015-07-27 Thread Paden
Pretty old thread. I know. But in the end it wasn't Solr. I'm fairly certainly that it was Tika. The autoparser wasn't pulling any of the .doc file text. It came out as just blank. The documents were 1997-2003. When I opened them in word 2010 and RESAVED them as 2010 documents they indexed just

Re: Dollar signs in field names

2015-07-27 Thread Erick Erickson
The problem has been that field naming conventions weren't _ever_ defined strictly. It's not that anyone is taking away the ability to use other characters, rather it's codifying what's always been true; Solr isn't guaranteed to play nice with naming conventions other than those specified on the

Help with separate root entities in DIH - One each for full and delta import.

2015-07-27 Thread Bade, Vidya (Sagar)
Hi, I am currently using Solr 4.10.2 and having issues with Delta-imports. For some reason delta seems to be inconsistent when using query caching. I am using SqlEntityProcessor. To overcome the issue I want to try having two root entities - one each for full and delta. Can someone help with a

Re: SolrJ/Tika custom indexer not indexing CERTAIN .doc text?

2015-07-27 Thread Alexandre Rafalovitch
Thank you for the update. The MSWord format changed significantly from .doc to .docx so has a different parser I suspect. I would not be surprised if old binary-format parser would miss something exotic in the documents (e.g. content of text boxes or frames). Regards, Alex. Solr

Re: Basic auth

2015-07-27 Thread Fadi Mohsen
Thank you, I tested providing my implementation of authentication in security.json, uploaded file to ZK (just considering authentication), started nodes and it worked like a charm. That required of course turning off Jetty basic auth. Although I'm not sure why you took this approach instead of

Re: Help with separate root entities in DIH - One each for full and delta import.

2015-07-27 Thread Shawn Heisey
On 7/27/2015 1:37 PM, Bade, Vidya (Sagar) wrote: I am currently using Solr 4.10.2 and having issues with Delta-imports. For some reason delta seems to be inconsistent when using query caching. I am using SqlEntityProcessor. To overcome the issue I want to try having two root entities - one

custom aggregate function

2015-07-27 Thread Tavazoei, Masoud
Hi, I am working on a project in which I need to first facet my data and create buckets and then apply a customize function to the aggregated stats. More specifically I need to look up the number of items in each bucket in an external table and return a normalized value. How can I apply a

RE: Migrating junit tests from Solr 4.5.1 to Solr 5.2.1

2015-07-27 Thread Rich Hume
Thanks Erick! In addition to looking at TestCoreDiscovery.java, I also found TestSolrProperties.java to be useful. My problems basically boiled down to my having not correctly set the cores up for discovery. I had moved away from the old style solr.xml, but had not done it correctly. Thanks

Sum Aggregate Query for a particular field

2015-07-27 Thread Vineeth Dasaraju
Hi, How can I get the sum of a particular field in the documents in solr? Eg:. [{item: ice cream, price : 345}, {item: snickers, price : 34}, {item: hersheys, price : 5}] I want to get the total price for the items. Regards, Vineeth

Re: Stemming Issue

2015-07-27 Thread Ahmet Arslan
Hi Ravi, Do you have a lowercase filter before the KStemFilter? There are a number of stemmer implementations out there. Ahmet On Monday, July 27, 2015 7:25 PM, EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS) external.ravi.tamin...@us.bosch.com wrote: Hi , I am using the

Use faceted search to drill down in hierarchical structure and omit node data outside current selection

2015-07-27 Thread PeterKerk
I have the following structure for my products, where a product may fall into multiple categories. In my case, a caketopper, which would be under cake/caketoppers as well as caketoppers (don't focus on the logic behind the category structure in this example). Category structure: cake

Re: Sum Aggregate Query for a particular field

2015-07-27 Thread naga sharathrayapati
try these total sum of individual items: /select?q=*:*wt=jsonindent=truerows=0json.facet={itemprice:{terms:{facet:{sum:sum(price)},field:item,limit:100,mincount:1}}} sum of all the items: /select?q=*:*wt=jsonindent=truerows=0json.facet={sum:sum(price)} On Mon, Jul 27, 2015 at 6:12 PM, Vineeth

Re: Solr Cloud: Duplicate documents in multiple shards

2015-07-27 Thread mesenthil1
Thanks Erick. As I understand now that the entire cluster goes down if any one shard is down, my first confusion is clarified. Following are the other details We really need to see details since I'm guessing we're talking past each other. So: *1 exactly how are you indexing documents?*

Re: Zookeeper state and its effect on Solr cluster.

2015-07-27 Thread Shawn Heisey
On 7/27/2015 10:21 PM, Modassar Ather wrote: Erick I am using the ZK upload process only. It is just that it is added into a script. The exception is coming when I am doing a RELOAD of collection after the ZK restart and fresh schema/solrconfig is uploaded. And once this exception occurs I

Re: Zookeeper state and its effect on Solr cluster.

2015-07-27 Thread Modassar Ather
If we upgrade zookeeper we need to restart. This upgrade process is automated for future releases/changes of zookeeper. This is a single external zookeeper which is completely stopped/shutdown. No Solr node are restarted/shutdown. What I have understanding that even if the zookeeper shuts down,

Re: Zookeeper state and its effect on Solr cluster.

2015-07-27 Thread Modassar Ather
Erick I am using the ZK upload process only. It is just that it is added into a script. The exception is coming when I am doing a RELOAD of collection after the ZK restart and fresh schema/solrconfig is uploaded. And once this exception occurs I have to restart the Solr nodes to get them working.

Re: SOLR server - Memory issue

2015-07-27 Thread Vishnu perumal
Thanks for the reply.. On Wed, Jul 22, 2015 at 9:55 PM, Erick Erickson erickerick...@gmail.com wrote: Frankly, I'm surprised it runs at all. 650M dos in 2G of memory is very, very, very aggressive. To get it to run at all I'm guessing that you have turned off things like term vectors,

Re: term frequency with stemming

2015-07-27 Thread Alessandro Benedetti
A part the funny crypted message by Darin xD I would like to focus on the initial user requirement : get term frequencies with fuzzy matching Solr/Lucene offer you the support for fuzzy query independently of the way you token filter your terms at analysis time. You can run fuzzy queries with