Re: Excessive Heap Usage from docValues?

2014-03-20 Thread Toke Eskildsen
On Wed, 2014-03-19 at 22:01 +0100, tradergene wrote: I have a Solr index with about 32 million docs. Each doc is relatively small but has multiple dynamic fields that are storing INTs. The initial problem that I had to resolve is that we were running into OOMs (on a 48GB heap, 130GB on-disk

Re: w/10 ? [was: Partial Counts in SOLR]

2014-03-20 Thread Salman Akram
Yup! On Thu, Mar 20, 2014 at 5:13 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Guessing it's surround query parser's support for within backed by span queries. Otis Solr ElasticSearch Support http://sematext.com/ On Mar 19, 2014 4:44 PM, T. Kuro Kurosaka

Re: Solr memory usage off-heap

2014-03-20 Thread Avishai Ish-Shalom
thanks! On Tue, Mar 18, 2014 at 4:37 PM, Erick Erickson erickerick...@gmail.comwrote: Avishai: It sounds like you already understand mmap. Even so you might be interested in this excellent writeup of MMapDirectory and Lucene by Uwe:

wrong query results with wdf and ngtf

2014-03-20 Thread Andreas Owen
Is there a way to tell ngramfilterfactory while indexing that number shall never be tokenized? then the query should be able to find numbers. Or do i have to change the ngram-min for numbers (not alpha) to 1, if that is possible? So to speak put the hole number as token and not all possible

Re: join and filter query with AND

2014-03-20 Thread Marcin Rzewucki
Nope. There is no line break in the string and it is not feed from file. What else could be the reason ? On 19 March 2014 17:57, Erick Erickson erickerick...@gmail.com wrote: It looks to me like you're feeding this from some kind of text file and you really _do_ have a line break after

Problems with the Suggest Request Handler in Solr 4.7.0

2014-03-20 Thread Steve Huckle
The Suggest Search Component that comes preconfigured in Solr 4.7.0 solrconfig.xml seems to thread dump when I call it: http://localhost:8983/solr/suggest?spellcheck=onq=acwt=jsonindent=true msg:No suggester named default was configured Can someone tell me what's going on there? However,

Solr dih to read Clob contents

2014-03-20 Thread Prasi S
Hi, I have a requirement to index a database table with clob content. Each row in my table a column which is an xml stored as clob. I want to read the contents of xmlthrough dih and map each of the xml tag to a separate solr field, Below is my clob content. root authorA/author

wrong results with wdf ngtf

2014-03-20 Thread Andreas Owen
Is there a way to tell ngramfilterfactory while indexing that number shall never be tokenized? then the query should be able to find numbers. Or do i have to change the ngram-min for numbers (not alpha) to 1, if that is possible? So to speak put the hole number as token and not all possible

Re: Solr dih to read Clob contents

2014-03-20 Thread Gora Mohanty
On 20 March 2014 14:53, Prasi S prasi1...@gmail.com wrote: Hi, I have a requirement to index a database table with clob content. Each row in my table a column which is an xml stored as clob. I want to read the contents of xmlthrough dih and map each of the xml tag to a separate solr field,

Re: Solr4.7 No live SolrServers available to handle this request

2014-03-20 Thread Greg Walters
Sathya, I assume you're using Solr Cloud. Please provide your clusterstate.json while you're seeing this issue and check your logs for any exceptions. With no information from you it's hard to troubleshoot any issues! Thanks, Greg On Mar 20, 2014, at 12:44 AM, Sathya

wrong results with wdf ngtf

2014-03-20 Thread aowen
Is there a way to tell ngramfilterfactory while indexing that number shall never be tokenized? then the query should be able to find numbers. Or do i have to change the ngram-min for numbers (not alpha) to 1, if that is possible? So to speak put the hole number as token and not all possible

Bootstrapping SolrCloud cluster with multiple collections in differene sharding/replication setup

2014-03-20 Thread Ugo Matrangolo
Hi, I would like some advice about the best way to bootstrap from scratch a SolrCloud cluster housing at least two collections with different sharding/replication setup. Going through the docs/'Solr In Action' book what I have sees so far is that there is a way to bootstrap a SolrCloud cluster

Re: join and filter query with AND

2014-03-20 Thread Erick Erickson
Well, the error message really looks like your input is getting chopped off. It's vaguely possible that you have some super-low limit in your servlet container configuration that is only letting very small packets through. What I'd do is look in the Solr log file to see exactly what is coming

Re: Bootstrapping SolrCloud cluster with multiple collections in differene sharding/replication setup

2014-03-20 Thread Mark Miller
Honestly, the best approach is to start with no collections defined and use the collections api. If you want to prefconfigure (which has it’s warts and will likely go away as an option), it’s tricky to do it with different numShards, as that is a global property per node. You would basically

Re: Bootstrapping SolrCloud cluster with multiple collections in differene sharding/replication setup

2014-03-20 Thread Erick Erickson
You might find this useful: http://heliosearch.org/solrcloud-assigning-nodes-machines/ It uses the collections API to create your collection with zero nodes, then shows how to assign your leaders to specific machines (well, at least specify the nodes the leaders will be created on, it doesn't

Multilingual indexing, search results, edismax and stopwords

2014-03-20 Thread kastania44
On our drupal multilingual system we use apache Solr 3.5. The problem is well known on different blogs, sites I read. The search results are not the one we want. On our code in hook apachesolr_query_alter we override the defaultOperator: $query-replaceParam('mm', '90%'); The requirement is, when

Re: Filter in terms component

2014-03-20 Thread Jilani Shaik
Will it work for multi value fields, It is saying that Field Cache will not work for multi value fields error. Most of the data is multi value fields in index. Thanks, Jilani On Thu, Mar 20, 2014 at 1:53 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, If you just need counts may be you can

understand debuginfo from query

2014-03-20 Thread aowen
i want the infos simplified so that the user can see why a doc was found bellow is the output a a doc: 0.085597195 = (MATCH) sum of: 0.083729245 = (MATCH) max of: 0.0019158133 = (MATCH) weight(plain_text:test^10.0 in 601) [DefaultSimilarity], result of: 0.0019158133 =

Singles in solr for bigrams,trigrams in parsed_query

2014-03-20 Thread Jyotirmoy Sundi
Hi Folks, I am using singles to index bigrams/trigrams. The same is also used for query in the schema.xml file. But when I run the query in debug mode for a collections, I dont see the bigrams in the parsed_query . Any idea what I might be missing.

Re: Parallel queries to Solr

2014-03-20 Thread solr2020
Thanks Shawn. When we run any solrj application , the below message is displayed org.apache.solr.client.solrj.impl.HttpClientUtil createClient INFO: Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false and while restarting solr we are getting this

Re: Bootstrapping SolrCloud cluster with multiple collections in differene sharding/replication setup

2014-03-20 Thread Jeff Wartes
Please note that although the article talks about the ADDREPLICA command, that feature is coming in Solr 4.8, so don¹t be confused if you can¹t find it yet. See https://issues.apache.org/jira/browse/SOLR-5130 On 3/20/14, 7:45 AM, Erick Erickson erickerick...@gmail.com wrote: You might find

Re: Filter in terms component

2014-03-20 Thread Jilani Shaik
Hi, Please provide some more pointers to go ahead in addressing this. Thnks, Jilani On Thu, Mar 20, 2014 at 8:50 PM, Jilani Shaik jilani24...@gmail.com wrote: Will it work for multi value fields, It is saying that Field Cache will not work for multi value fields error. Most of the data is

Limit on # of collections -SolrCloud

2014-03-20 Thread Chris W
Hi there Is there a limit on the # of collections solrcloud can support? Can zk/solrcloud handle 1000s of collections? Also i see that the bootup time of solrcloud increases with increase in # of cores. I do not have any expensive warm up queries. How do i speedup solr startup? -- Best -- C

Re: Limit on # of collections -SolrCloud

2014-03-20 Thread Shalin Shekhar Mangar
There are no arbitrary limits on the number of collections but yes there are practical limits. For example, the cluster state can become a bottleneck. There is a lot of work happening on finding and addressing these problems. See https://issues.apache.org/jira/browse/SOLR-5381 Boot up time is

Re: Solr4.7 No live SolrServers available to handle this request

2014-03-20 Thread Michael Sokolov
I'm getting a similar exception when writing documents (on the client side). I can write one document fine, but the second (which is being routed to a different shard) generates the error. It happens every time - definitely not a resource issue or timing problem since this database is

Re: Limit on # of collections -SolrCloud

2014-03-20 Thread Chris W
Thanks, Shalin. Making clusterstate.json on a collection basis sounds awesome. I am not having problems with #2 . #3 is a major time hog in my environment. I have over 300 +collections and restarting the entire cluster takes in the order of hours. (2-3 hour). Can you explain more about the

Re: Filter in terms component

2014-03-20 Thread Ahmet Arslan
Hi, I suggest you start a new threat describing your use case. Just describe the problem without assumptions. With a appropriate title/subject. Ahmet On Thursday, March 20, 2014 10:01 PM, Jilani Shaik jilani24...@gmail.com wrote: Hi, Please provide some more pointers to go ahead in

Re: Limit on # of collections -SolrCloud

2014-03-20 Thread Erick Erickson
How many total replicas are we talking here? As in how many shards and, for each shard, how many replicas? I'm not asking for a long list here, just if you have a bazillion replicas in aggregate. Hours is surprising. Best, Erick On Thu, Mar 20, 2014 at 2:17 PM, Chris W chris1980@gmail.com

Re: Limit on # of collections -SolrCloud

2014-03-20 Thread Otis Gospodnetic
Hours sounds too long indeed. We recently had a client with several thousand collections, but restart wasn't taking hours... Otis Solr ElasticSearch Support http://sematext.com/ On Mar 20, 2014 5:49 PM, Erick Erickson erickerick...@gmail.com wrote: How many total replicas are we talking here?

Re: Limit on # of collections -SolrCloud

2014-03-20 Thread Chris W
The replication factor is two. I have equally sharded all collections across all nodes. We have a 6 node cluster setup. 300* 6 shards and 2 replicas per shard. I have almost 600 cores per machine Also one fact is that my zk timeout is in the order of 2-3 minutes. I see zk responses very slow and

SOLR synonyms - Explicit mappings

2014-03-20 Thread bbi123
I need some clarification of how to define explicit mappings in synonyms.txt file. I have been using equivalent synonyms for a while and it works as expected. I am confused with explicit mapping. I have the below synonyms added to query analyzer. I want the search on keyword 'watch' to

Best approach to handle large volume of documents with constantly high incoming rate?

2014-03-20 Thread shushuai zhu
Hi, I am looking for some advice to handle large volume of documents with a very high incoming rate. The size of each document is about 0.5 KB and the incoming rate could be more than 20K per second and we want to store about one year's documents in Solr for near real=time searching. The goal

Memory + WeakIdentityMap

2014-03-20 Thread Harish Agarwal
I'm transitioning my index from a 3.x version to 4.6. I'm running a large heap (20G), primarily to accomodate a large facet cache (~5G), but have been able to run it on 3.x stably. On 4.6.0 after stress testing I'm finding that all of my shards are spending all of their time in GC. After taking

Rounding errors with SOLR score

2014-03-20 Thread William Bell
When doing complex boosting/bq we are getting rounding errors on the score. To get the score to be consistent I needed to use rint on sort: sort=rint(product(sum($p_score,$s_score,$q_score),100)) desc,s_query asc str name=p_scorerecip(priority,1,.5,.01)/str str

Re: solr cloud distributed optimize() becomes serialized

2014-03-20 Thread William Bell
Yeah. optimize() also used to come back immediately if the index was already indexed. It just reopened the index. We uses to use that for cleaning up the old directories quickly. But now it does another optimize() even through the index is already optimized. Very strange. On Tue, Mar 18, 2014

Re: Wiki edit rights

2014-03-20 Thread William Bell
PLease add me too. On Tue, Mar 18, 2014 at 8:33 AM, Erick Erickson erickerick...@gmail.comwrote: Done, thanks! On Tue, Mar 18, 2014 at 3:54 AM, Anders Gustafsson anders.gustafs...@pedago.fi wrote: Yes, please. My Wiki ID is Anders Gustafsson But yes, please, add the howto to Wiki. You

Re: solr cloud distributed optimize() becomes serialized

2014-03-20 Thread Shalin Shekhar Mangar
That's not right. Which Solr versions are you on (question for both William and Chris)? On Fri, Mar 21, 2014 at 8:07 AM, William Bell billnb...@gmail.com wrote: Yeah. optimize() also used to come back immediately if the index was already indexed. It just reopened the index. We uses to use

Re: Wiki edit rights

2014-03-20 Thread Shalin Shekhar Mangar
What's your wiki username? On Fri, Mar 21, 2014 at 8:12 AM, William Bell billnb...@gmail.com wrote: PLease add me too. On Tue, Mar 18, 2014 at 8:33 AM, Erick Erickson erickerick...@gmail.comwrote: Done, thanks! On Tue, Mar 18, 2014 at 3:54 AM, Anders Gustafsson

Re: Memory + WeakIdentityMap

2014-03-20 Thread Shawn Heisey
On 3/20/2014 6:54 PM, Harish Agarwal wrote: I'm transitioning my index from a 3.x version to 4.6. I'm running a large heap (20G), primarily to accomodate a large facet cache (~5G), but have been able to run it on 3.x stably. On 4.6.0 after stress testing I'm finding that all of my shards