Re: SOLR OutOfMemoryError Java heap space

2014-03-06 Thread Angel Tchorbadjiiski
Hi Shawn, a big thanks for the long and detailed answer. I am aware of how linux uses free RAM for caching and the the problems related to jvm and GC. It is nice to hear how this correlates to Solr. I'll take some time and think over it. The facet.method=enum and probably a combination of

Re: need suggestions for storing TBs of strutucred data in SolrCloud

2014-03-06 Thread Toke Eskildsen
On Thu, 2014-03-06 at 08:17 +0100, Chia-Chun Shih wrote: 1. Raw data is 35,000 CSV files per day. Each file is about 5 MB. 2. One collection serves one day. 200-day history data is required. So once your data are indexed, they will not change? If seems to me that 1 shard/day is a fine

Need help regarding SOLR Fulltext search

2014-03-06 Thread Raman Jhajj
Hello Everyone, Let me first introduce myself, I am Raman, I am a Masters of CS student. I am doing a project for my studies which need the use of SOLR. For some reasons I have to use SOLR 4.3.0 for the project. I am facing an issue with page numbers in the search result.I came across a

Re: Need help regarding SOLR Fulltext search

2014-03-06 Thread Ahmet Arslan
Hi Roman, I did similar project, this is how : 1) index page by page. Solr document (unit of retrieval) will be pages. You can generate an uniqueKey by concatenating docId and pageNo = doc50_page0 With this you will have page no information.  2) Later on you can group by document_id with 

Re: Min Number Should Match (mm) and joins

2014-03-06 Thread mm
Any suggestions? Zitat von m...@preselect-media.com: Hello, I'm using eDisMax to do scoring for my search results. I have a nested structure of documents. The main (parent) document with meta data and the child documents with fulltext content. So I have to join them. My qf looks like

Mixing lucene scoring and other scoring

2014-03-06 Thread Benson Margulies
Some months ago, I talked to some people at LR about this, but I can't find my notes. Imagine a function of some fields that produces a score between 0 and 1. Imagine that you want to combine this score with relevance over some more or less complex ordinary query. What are the options, given

Re: Polygon search returning Invalid Number error.

2014-03-06 Thread leevduhl
My bad, I think this error was actually a result of using the Solr Admin utility to query the index and the query I entered included the double quotes. However, this left me with a different error that I may post a question about if I cannot figure it out. -- View this message in context:

Re: Mixing lucene scoring and other scoring

2014-03-06 Thread Otis Gospodnetic
Hi Benson, http://lucene.apache.org/core/4_7_0/expressions/org/apache/lucene/expressions/Expression.html https://issues.apache.org/jira/browse/SOLR-5707 That? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Thu, Mar 6,

Re: Replicating Between Solr Clouds

2014-03-06 Thread perdurabo
Toby Lazar wrote Unless Solr is your system of record, aren't you already replicating your source data across the WAN? If so, could you load Solr in colo B from your colo B data source? You may be duplicating some indexing work, but at least your colo B Solr would be more closely in sync

Re: Indexing huge data

2014-03-06 Thread Rallavagu
Erick, That helps so I can focus on the problem areas. Thanks. On 3/5/14, 6:03 PM, Erick Erickson wrote: Here's the easiest thing to try to figure out where to concentrate your energies. Just comment out the server.add call in your SolrJ program. Well, and any commits you're doing from

Re: Indexing huge data

2014-03-06 Thread Rallavagu
Yeah. I have thought about spitting out JSON and run it against Solr using parallel Http threads separately. Thanks. On 3/5/14, 6:46 PM, Susheel Kumar wrote: One more suggestion is to collect/prepare the data in CSV format (1-2 million sample depending on size) and then import data direct

Polygon search returning InvalidShapeException: incompatible dimension (2)... error.

2014-03-06 Thread leevduhl
Getting the following error when attempting to run a polygon query from the Solr Admin utility: :com.spatial4j.core.exception.InvalidShapeException: incompatible dimension (2) and values (Intersects). Only 0 values specified, code:400 My query is as follows:

Re: need suggestions for storing TBs of strutucred data in SolrCloud

2014-03-06 Thread Shawn Heisey
On 3/6/2014 12:17 AM, Chia-Chun Shih wrote: I am planning a system for searching TB's of structured data in SolrCloud. I need suggestions for handling such huge amount of data in SolrCloud. (e.g., number of shards per collection, number of nodes, etc.) Here are some specs of the system:

Re: SOLR OutOfMemoryError Java heap space

2014-03-06 Thread Divyang Shah
hi, heap problem is due to memory full. you should remove unnecessary data and restart server once. On Thursday, 6 March 2014 10:39 AM, Angel Tchorbadjiiski angel.tchorbadjii...@antibodies-online.com wrote: Hi Shawn, a big thanks for the long and detailed answer. I am aware of how linux

Re: Replicating Between Solr Clouds

2014-03-06 Thread Shawn Heisey
On 3/6/2014 7:54 AM, perdurabo wrote: Toby Lazar wrote Unless Solr is your system of record, aren't you already replicating your source data across the WAN? If so, could you load Solr in colo B from your colo B data source? You may be duplicating some indexing work, but at least your colo B

Race condition in Leader Election

2014-03-06 Thread KNitin
Hi When restarting a node in solrcloud, i run into scenarios where both the replicas for a shard get into recovering state and never come up causing the error No servers hosting this shard. To fix this, I either unload one core or restart one of the nodes again so that one of them becomes the

SolrCloud setup guidance

2014-03-06 Thread Priti Solanki
Hello Everyone, I would like to take you guidance of following I have a single core with 124 GB of index data size. Indexing and Reading both are very slow as I have 7 GB RAM to support this huge data. Almost 8 million of documents. Hence, we thought of going to SolrCloud so that we can

Re: Replicating Between Solr Clouds

2014-03-06 Thread perdurabo
Well, I think I finally figured out how to get SolrEntityProcessor to work, but there are still some issues. I had to add a library path to solrconfig.xml, but the cores are finally coming up and i am now manually able to run a data import that does seem to index all of the documents on the

Date Range Query taking more time.

2014-03-06 Thread Vijay Kokatnur
I am working with date range query that is not giving me faster response times. After modifying date range construct after reading several forums, response time now is around 200ms, down from 2-3secs. However, I was wondering if there still some way to improve upon it as queries without date

Re: Date Range Query taking more time.

2014-03-06 Thread Ahmet Arslan
Hi, Since your range query has NOW in it, it won't be cached meaningfully. http://solr.pl/en/2012/03/05/use-of-cachefalse-and-cost-parameters/ This is untested but can you try this? q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336 fq=Status:Booked fq=ClientID:4 fq={!cache=false

Re: Date Range Query taking more time.

2014-03-06 Thread Vijay Kokatnur
Ahmet, I have tried filter queries before to fine tune query performance. However, whenever we use filter queries the response time goes up and remains there. With above change, the response time was consistently around 4-5 secs. We are using the default cache settings. Is there any settings I

Re: Race condition in Leader Election

2014-03-06 Thread Mark Miller
Are you using an old version? - Mark http://about.me/markrmiller On Mar 6, 2014, at 11:50 AM, KNitin nitin.t...@gmail.com wrote: Hi When restarting a node in solrcloud, i run into scenarios where both the replicas for a shard get into recovering state and never come up causing the error

Re: Date Range Query taking more time.

2014-03-06 Thread Ahmet Arslan
Hi, Did you try with non-cached filter quries before? cached Filter queries are useful when they are re-used. How often do you commit? I thought that we can do something if we disable cache filter queries and manipulate their execution order with cost parameter. What happens with this :

Re: Date Range Query taking more time.

2014-03-06 Thread Vijay Kokatnur
That did the trick Ahmet. The first response was around 200ms, but the subsequent queries were around 2-5ms. I tried this q=UserID:AC10263A-E28B-99F9-0012-AAA42DDD9336 fq={!cache=false cost=100}Status:Booked fq={!cache=false cost=50}ClientID:4 fq={!cache=false cost=50}[NOW/DAY TO NOW/DAY+1YEAR]

hung threads and CLOSE_WAIT sockets

2014-03-06 Thread Avishai Ish-Shalom
Hi, We've had a strange mishap with a solr cloud cluster (version 4.5.1) where we observed high search latency. The problem appears to develop over several hours until such point where the entire cluster stopped responding properly. After investigation we found that the number of threads (both

Re: Polygon search returning InvalidShapeException: incompatible dimension (2)... error.

2014-03-06 Thread leevduhl
Ok, I think the issue here is that I need to install the JTS library. I will have that done and try again. Lee -- View this message in context: http://lucene.472066.n3.nabble.com/Polygon-search-returning-InvalidShapeException-incompatible-dimension-2-error-tp4121704p4121796.html Sent from

Re: Solr Filter Cache Size

2014-03-06 Thread Otis Gospodnetic
What Erick said. That's a giant Filter Cache. Have a look at these Solr metrics and note the Filter Cache in the middle: http://www.flickr.com/photos/otis/8409088080/ Note how small the cache is and how high the hit rate is. Those are stats for http://search-lucene.com/ and

RE: SolrCloud setup guidance

2014-03-06 Thread Susheel Kumar
Setting up Solr cloud(horizontal scaling) is definitely a good idea for this big index but before going to Solr cloud, are you able to upgrade your single node to 128GB of memory(vertical scaling) to see the difference. Thanks, Susheel -Original Message- From: Priti Solanki

Re: Race condition in Leader Election

2014-03-06 Thread KNitin
I am using 4.3.1. On Thu, Mar 6, 2014 at 11:48 AM, Mark Miller markrmil...@gmail.com wrote: Are you using an old version? - Mark http://about.me/markrmiller On Mar 6, 2014, at 11:50 AM, KNitin nitin.t...@gmail.com wrote: Hi When restarting a node in solrcloud, i run into scenarios

Re: Apache Solr Configuration Problem (Japanese Language)

2014-03-06 Thread T. Kuro Kurosaka
Andy, I don't have a direct answer to your question but I have a question. On 03/05/2014 07:21 AM, Andy Alexander wrote: fq=ss_language:jaq=製品 I am guessing you have a field called ss_language where a language code of the document is stored, and you have Solr documents of different

Re: hung threads and CLOSE_WAIT sockets

2014-03-06 Thread Mark Miller
It sounds like the distributed update deadlock issue. It’s fixed in 4.6.1 and 4.7. - Mark http://about.me/markrmiller On Mar 6, 2014, at 3:10 PM, Avishai Ish-Shalom avis...@fewbytes.com wrote: Hi, We've had a strange mishap with a solr cloud cluster (version 4.5.1) where we observed high

Re: Date Range Query taking more time.

2014-03-06 Thread Chris Hostetter
: That did the trick Ahmet. The first response was around 200ms, but the : subsequent queries were around 2-5ms. Are you really sure you want cache=false on all of those filters? While the ClientID:4 query may by something that cahnges significantly enough in every query to not be useful to

SolrCloud constantly crashes after upgrading to Solr 4.7

2014-03-06 Thread Martin de Vries
Hi, We have 5 Solr servers in a Cloud with about 70 cores and 12GB indexes in total (every core has 2 shards, so it's 6 GB per server). After upgrade to Solr 4.7 the Solr servers are crashing constantly (each server about one time per hour). We currently don't have any clue about the

Re: Date Range Query taking more time.

2014-03-06 Thread Ahmet Arslan
Hoss, Thanks for the correction. I missed the /DAY part and thought as it was   StartDate:[NOW TO NOW+1YEAR] Ahmet On Friday, March 7, 2014 12:33 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : That did the trick Ahmet.  The first response was around 200ms, but the : subsequent queries

Re: SolrCloud constantly crashes after upgrading to Solr 4.7

2014-03-06 Thread Mark Miller
On Mar 6, 2014, at 5:37 PM, Martin de Vries mar...@downnotifier.com wrote: IndexSchema is using 62% of the memory but we don't know if that's a problem: That seems odd. Can you see what objects are taking all the RAM in the IndexSchema? - Mark http://about.me/markrmiller

Re: SolrCloud constantly crashes after upgrading to Solr 4.7

2014-03-06 Thread Shawn Heisey
On 3/6/2014 3:37 PM, Martin de Vries wrote: We have 5 Solr servers in a Cloud with about 70 cores and 12GB indexes in total (every core has 2 shards, so it's 6 GB per server). After upgrade to Solr 4.7 the Solr servers are crashing constantly (each server about one time per hour). We

SolrCloud recovery after nodes are rebooted in rapid succession

2014-03-06 Thread Nazik Huq
Hello, I have a question from a colleague who's managing a 3-node(VMs) SolrCloud cluster with a separate 3-node Zookeeper ensemble. Periodically the data center underneath the SolrCloud decides to upgrade the SolrCloud instance infrastructure in a rolling upgrade fashion. So after the 1st

Re:Solr 4.7.0 - cursorMark question

2014-03-06 Thread Greg Pendlebury
* New 'cursorMark' request param for efficient deep paging of sorted result sets. See http://s.apache.org/cursorpagination; At the end of the linked doco there is an example that doesn't make sense to me, because it mentions sort=timestamp asc and is then followed by pseudo code that sorts by

Re: SolrCloud recovery after nodes are rebooted in rapid succession

2014-03-06 Thread Mark Miller
Would probably need to see some logs to say much. Need to understand why they are inoperable. What version is this? - Mark http://about.me/markrmiller On Mar 6, 2014, at 6:15 PM, Nazik Huq nazik...@yahoo.com wrote: Hello, I have a question from a colleague who's managing a

Re: Indexing huge data

2014-03-06 Thread Kranti Parisa
thats what I do. precreate JSONs following the schema, saving that in MongoDB, this is part of the ETL process. after that, just dump the JSONs into Solr using batching etc. with this you can do full and incremental indexing as well. Thanks, Kranti K. Parisa

Re:Solr 4.7.0 - cursorMark question

2014-03-06 Thread Chris Hostetter
: At the end of the linked doco there is an example that doesn't make sense : to me, because it mentions sort=timestamp asc and is then followed by : pseudo code that sorts by id only. I understand that cursorMark requires Ok ... 2 things contributing to the confusion. 1) the para that refers

Re: Solr 4.7.0 - cursorMark question

2014-03-06 Thread Greg Pendlebury
Thank-you, that all sounds great. My assumption about documents being missed was something like this: A,B,C,D where they are sorted by timestamp first and ID second. Say the first 'page' of results is 'A,B', and before the second page is requested both documents B + C receive update events and

Re: Date Range Query taking more time.

2014-03-06 Thread Vijay Kokatnur
My initial approach was to use filter cache static fields. However when filter query is used, every query after the first has the same response time as the first. For instance, when cache is enabled in the query under review, response time shoots up to 4-5secs and stays there. We are using

RE: SolrCloud recovery after nodes are rebooted in rapid succession

2014-03-06 Thread Nazik Huq
The version is 4.6. I am going to ask for the log files and post it. -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Thursday, March 06, 2014 6:33 PM To: solr-user Subject: Re: SolrCloud recovery after nodes are rebooted in rapid succession Would probably need

Dataimport handler Date

2014-03-06 Thread Pritesh Patel
I'm using the dataimporthandler to index data from a mysql DB. Been running it just fine. I've been using full-imports. I'm now trying implement the delta import functionality. To implement the delta query, you need to be reading the last_index_time from a properties file to know what new to

Re: SolrCloud setup guidance

2014-03-06 Thread Priti Solanki
Thanks Susheel, But this index will keep on growing that my worry So I always have to increase the RAM . Can you suggest how many nodes one can think to support this bug index? Regards, On Fri, Mar 7, 2014 at 2:50 AM, Susheel Kumar susheel.ku...@thedigitalgroup.net wrote: Setting up Solr

What is mean by Index Searcher?

2014-03-06 Thread search engn dev
I am reading apache solr reference guide and it has lines as below . Solr caches are associated with a specific instance of an Index Searcher, a specific view of an index that doesn't change during the lifetime of that searcher. As long as that Index Searcher is being used, any items in its

Re: What is mean by Index Searcher?

2014-03-06 Thread Alexandre Rafalovitch
That's under the covers implementation. Unless you are doing extensions, you probably don't need to worry. Where it connects to the userland is - for example - the commits. Until you commit, your records are not visible. Even though Solr already has them. This is because the 'index searcher' does

Re: Dataimport handler Date

2014-03-06 Thread Gora Mohanty
On 7 March 2014 08:50, Pritesh Patel priteshpate...@gmail.com wrote: I'm using the dataimporthandler to index data from a mysql DB. Been running it just fine. I've been using full-imports. I'm now trying implement the delta import functionality. To implement the delta query, you need to be

SolrCloud with Tomcat

2014-03-06 Thread Vineet Mishra
Hi I am installing SolrCloud with 3 External Zookeeper(localhost:2181,localhost:2182,localhost:2183) and 2 Tomcats(localhost:8181,localhost:8182) all available on a single Machine(Just for getting started). By Following these links