Can I use multiple cores

2014-08-12 Thread Ramprasad Padmanabhan
I need to store in SOLR all data of my clients mailing activitiy The data contains meta data like From;To:Date;Time:Subject etc I would easily have 1000 Million records every 2 months. What I am currently doing is creating cores per client. So I have 400 cores already. Is this a good idea to

Re: When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck

2014-08-12 Thread Harun Reşit Zafer
I happens once the server is fully started. And when it gets stuck sometimes I have to restart the server, sometimes I'm able to edit the solrconfig.xml and reload it. Harun Reşit Zafer TÜBİTAK BİLGEM BTE Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü T +90 262 675 3268 W

Re: Can I use multiple cores

2014-08-12 Thread Anshum Gupta
Hi Ramprasad, You can certainly have a system with hundreds of cores. I know of more than a few people who have done that successfully in their setups. At the same time, I'd also recommend to you to have a look at SolrCloud. SolrCloud takes away the operational pains like replication/recovery

Re: Can I use multiple cores

2014-08-12 Thread Toke Eskildsen
On Tue, 2014-08-12 at 08:40 +0200, Ramprasad Padmanabhan wrote: I need to store in SOLR all data of my clients mailing activitiy The data contains meta data like From;To:Date;Time:Subject etc I would easily have 1000 Million records every 2 months. If standard searches are always inside a

Re: Can I use multiple cores

2014-08-12 Thread Harshvardhan Ojha
I think this question is more aimed at design and performance of large number of cores. Also solr is designed to handle multiple cores effectively, however it would be interesting to know If you have observed any performance problem with growing number of cores, with number of nodes and solr

Re: what's the difference between solr and elasticsearch in hdfs case?

2014-08-12 Thread Jianyi
Hi Alex, Thanks for your reply. I'm comparing Solr vs. ElasticSearch. Dose solr store index on hdfs in raw lucene format? I mean, if in that way, we can get the index files from hdfs and directly put them into an application based on lucene. It seems that ElasticSearch dose not store the raw

Re: When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck

2014-08-12 Thread Harun Reşit Zafer
I tried again to make sure. Server starts, I can see web admin gui but I can't navigate btw tabs. It just says loading. But on the terminal console everything seems normal. Harun Reşit Zafer TÜBİTAK BİLGEM BTE Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü T +90 262 675 3268 W

Re: SolrCloud OOM Problem

2014-08-12 Thread Toke Eskildsen
On Tue, 2014-08-12 at 01:27 +0200, dancoleman wrote: My SolrCloud of 3 shard / 3 replicas is having a lot of OOM errors. Here are some specs on my setup: hosts: all are EC2 m1.large with 250G data volumes Is that 3 (each running a primary and a replica shard) or 6 instances? documents:

Re: Can I use multiple cores

2014-08-12 Thread Ramprasad Padmanabhan
Are there documented benchmarks with number of cores As of now I just have a test bed. We have 150 million records ( will go up to 1000 M ) , distributed in 400 cores. A single machine 16GB RAM + 16 cores search is working fine But I still am not sure will this work fine in production

Re: Help Required

2014-08-12 Thread Dmitry Kan
Hi, is http://wiki.apache.org/solr/Support page immutable? Dmitry On Fri, Aug 8, 2014 at 4:24 PM, Jack Krupansky j...@basetechnology.com wrote: And the Solr Support list is where people register their available consulting services: http://wiki.apache.org/solr/Support -- Jack Krupansky

Modifying date format when using TrieDateField.

2014-08-12 Thread Modassar Ather
Hi, I have a TrieDateField where I want to store a date in -MM-dd format as my source contains the date in same format. As I understand TrieDateField stores date in -MM-dd'T'HH:mm:ss format hence the date is getting formatted to the same. Kindly let me know: How can I change the

Re: Modifying date format when using TrieDateField.

2014-08-12 Thread Jack Krupansky
Use the parse date update request processor: http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html Additional examples are in my e-book:

Re: Can I use multiple cores

2014-08-12 Thread Toke Eskildsen
On Tue, 2014-08-12 at 11:50 +0200, Ramprasad Padmanabhan wrote: Are there documented benchmarks with number of cores As of now I just have a test bed. We have 150 million records ( will go up to 1000 M ) , distributed in 400 cores. A single machine 16GB RAM + 16 cores search is working

Re: Can I use multiple cores

2014-08-12 Thread Ramprasad Padmanabhan
Sorry for missing information. My solr-cores take less than 200MB of disk What I am worried about is If I run too many cores from a single solr machine there will be a limit to the number of concurrent searches it can support. I am still benchmarking for this. Also another major bottleneck I

Re: Can I use multiple cores

2014-08-12 Thread Toke Eskildsen
On Tue, 2014-08-12 at 14:14 +0200, Ramprasad Padmanabhan wrote: Sorry for missing information. My solr-cores take less than 200MB of disk So ~3GB/server. If you do not have special heavy queries, high query rate or heavy requirements for index availability, that really sounds like you could

Re: Modifying date format when using TrieDateField.

2014-08-12 Thread Modassar Ather
Hi Jack, Thanks for your suggestion. I think the way I am using the ParseDateFieldUpdateProcessorFactory is not right hence the date is not getting transformed to the desired format. I added following in solrconfig.xml and see no effect in search result. The date is still in -MM-dd'T'HH:mm:ss

Re: Can I use multiple cores

2014-08-12 Thread Noble Paul
Hi Ramprasad, I have used it in a cluster with millions of users (1 user per core) in legacy cloud mode .We used the on demand core loading feature where each Solr had 30,000 cores and at a time only 2000 cores were in memory. You are just hitting 400 and I don't see much of a problem . What is

Re: Can I use multiple cores

2014-08-12 Thread Aurélien MAZOYER
Hi Paul and Ramprasad, I follow your discussion with interest as I will have more or less the same requirement. When you say that you use on demand core loading, are you talking about LotsOfCore stuff? Erick told me that it does not work very well in a distributed environnement. How do you

RE: When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck

2014-08-12 Thread Dyer, James
Harun, What do you mean by the terminal console? Do you mean to say the admin gui freezes but you can still issue queries to solr directly through your browser? James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Harun Reşit Zafer

Re: Can I use multiple cores

2014-08-12 Thread Ramprasad Padmanabhan
On 12 August 2014 18:18, Noble Paul noble.p...@gmail.com wrote: Hi Ramprasad, I have used it in a cluster with millions of users (1 user per core) in legacy cloud mode .We used the on demand core loading feature where each Solr had 30,000 cores and at a time only 2000 cores were in memory.

Updates to index not available immediately as index scales, even with autoSoftCommit at 1 second

2014-08-12 Thread cwhit
I've been trying to debug through this but I'm stumped. I have a Solr index with ~40 million documents indexed currently sitting idle. I update an existing document through the web interface (collection1 - Documents - /update) and the web request returns successfully. At this point, I expect

Re: Help Required

2014-08-12 Thread Shawn Heisey
On 8/12/2014 3:57 AM, Dmitry Kan wrote: Hi, is http://wiki.apache.org/solr/Support page immutable? All pages on that wiki are changeable by end users. You just need to create an account on the wiki and then ask on this list to have your wiki username added to the Contributor group. Thanks,

RE: Can I use multiple cores

2014-08-12 Thread Toke Eskildsen
Ramprasad Padmanabhan [ramprasad...@gmail.com] wrote: I have a single machine 16GB Ram with 16 cpu cores Ah! I thought you had more machines, each with 16 Solr cores. This changes a lot. 400 Solr cores of ~200MB ~= 80GB of data. You're aiming for 7 times that, so about 500GB of data. Running

Re: Modifying date format when using TrieDateField.

2014-08-12 Thread Erick Erickson
The response will always be the full specification, so you'll have -MM-dd'T'HH:mm:ss format. If you want the user to just see the -MM-dd you could use a DocTransformer to change it on the way out. You cannot change the way the dates are stored internally. The DateTransformer is just there

Re: Updates to index not available immediately as index scales, even with autoSoftCommit at 1 second

2014-08-12 Thread Chris Hostetter
You havne't given us a lot of information to go on (ie: full solrconfig.xml, log messages arround the tim of your update, etc...) but my best guess would be that you are seeing a delay between the time the new searcher is opened and the time the newSearcher is made available to requests due

Re: Updates to index not available immediately as index scales, even with autoSoftCommit at 1 second

2014-08-12 Thread cwhit
I'm not seeing any messages in the log with respect to cache warming at the time, but I will investigate that possibility. Thank you. In case it is helpful, I pasted the entire solrconfig.xml at http://pastebin.com/C0iQ7E9a -- View this message in context:

Re: Updates to index not available immediately as index scales, even with autoSoftCommit at 1 second

2014-08-12 Thread Chris Hostetter
: I'm not seeing any messages in the log with respect to cache warming at the : time, but I will investigate that possibility. Thank you. In case it is what logs *do* you see at the time you send the doc? w/o details, we can't help you. : helpful, I pasted the entire solrconfig.xml at

Re: Can I use multiple cores

2014-08-12 Thread Noble Paul
The machines were 32GB ram boxes. You must do the RAM requirement calculation for your indexes . Just the no:of indexes alone won't be enough to arrive at the RAM requirement On Tue, Aug 12, 2014 at 6:59 PM, Ramprasad Padmanabhan ramprasad...@gmail.com wrote: On 12 August 2014 18:18, Noble

Re: Updates to index not available immediately as index scales, even with autoSoftCommit at 1 second

2014-08-12 Thread cwhit
Immediately after triggering the update, this is what is in the logs: /2014-08-12 12:58:48,774 | [71] | 153414367 [qtp2038499066-4772] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update params={wt=json} {add=[52627624 (1476251068652322816)]} 0 34

Re: what's the difference between solr and elasticsearch in hdfs case?

2014-08-12 Thread Erick Erickson
I just pinged someone who really knows this stuff and the reply is that he's copied the index from HDFS to a local file system in order to inspect it with Luke, which means the bits on disk are identical and may freely be copied back and forth. So I'd say go for it. Erick On Tue, Aug 12, 2014

RE: Updates to index not available immediately as index scales, even with autoSoftCommit at 1 second

2014-08-12 Thread Matt Kuiper (Springblox)
Based on your solrconfig.xml settings for the filter and queryResult caches, I believe Chris's initial guess is correct. After a commit, there is likely plenty of time spent warming these caches due to the significantly high autowarm counts. filterCache class=solr.FastLRUCache

Re: SolrCloud OOM Problem

2014-08-12 Thread tuxedomoon
I have modified my instances to m2.4xlarge 64-bit with 68.4G memory. Hate to ask this but can you recommend Java memory and GC settings for 90G data and the above memory? Currently I have CATALINA_OPTS=${CATALINA_OPTS} -XX:NewSize=1536m -XX:MaxNewSize=1536m -Xms5120m -Xmx5120m -XX:+UseParNewGC

Access request

2014-08-12 Thread Vitaliy Zhovtiuk
Hello, Please provide me access. User id vzhovtyuk My email vzhovt...@gmail.com Wiki user 'Vitaliy Zhovtyuk'

ICUTokenizer acting very strangely with oriental characters

2014-08-12 Thread Shawn Heisey
The field value is this: 20世紀の100人;ポートレートアーカイブス;政治家・軍人;政治家・指導 者・軍人;[政 治],100peopeof20century,pploftwentycentury,pploftwentycentury The problem: We can't match this field with a search for 100peopeof20century. The analysis shows that there are three terms indexed at the critical point by

Re: SolrCloud OOM Problem

2014-08-12 Thread Shawn Heisey
On 8/12/2014 3:12 PM, tuxedomoon wrote: I have modified my instances to m2.4xlarge 64-bit with 68.4G memory. Hate to ask this but can you recommend Java memory and GC settings for 90G data and the above memory? Currently I have CATALINA_OPTS=${CATALINA_OPTS} -XX:NewSize=1536m

Re: Access request

2014-08-12 Thread Shawn Heisey
On 8/12/2014 3:29 PM, Vitaliy Zhovtiuk wrote: Please provide me access. User id vzhovtyuk My email vzhovt...@gmail.com Wiki user 'Vitaliy Zhovtyuk' Wiki username added to the Solr wiki contributors group.You didn't indicate exactly what kind of access you wanted, but that's the only kind of

Re: ICUTokenizer acting very strangely with oriental characters

2014-08-12 Thread Shawn Heisey
See the original message on this thread for full details. Some additional information: This happens on version 4.6.1, 4.7.2, and 4.9.0. Here is a screenshot showing the analysis problem in more detail. The first line you can see is the ICUTokenizer.

Solr query involving Street Addresses

2014-08-12 Thread Guph
I'm very new to Solr, and could use a point in the right direction on a task I've been assigned. I have a database containing customer information (phone number, email address, credit card, billing address, shipping address, etc.). I need to be able to take user-entered data, and use it to

Re: ICUTokenizer acting very strangely with oriental characters

2014-08-12 Thread Rik Tamm-Daniels
mmn jnbbbjb)n9nooon Sent from my HTC - Reply message - From: Shawn Heisey s...@elyograg.org To: solr-user@lucene.apache.org solr-user@lucene.apache.org Subject: ICUTokenizer acting very strangely with oriental characters Date: Tue, Aug 12, 2014 19:00 See the original message on

Re: ICUTokenizer acting very strangely with oriental characters

2014-08-12 Thread Steve Rowe
Shawn, ICUTokenizer is operating as designed here. The key to understanding this is o.a.l.analysis.icu.segmentation.ScriptIterator.isSameScript(), called from ScriptIterator.next() with the scripts of two consecutive characters; these methods together find script boundaries. Here’s

Re: ICUTokenizer acting very strangely with oriental characters

2014-08-12 Thread Shawn Heisey
On 8/12/2014 6:29 PM, Steve Rowe wrote: Shawn, ICUTokenizer is operating as designed here. The key to understanding this is o.a.l.analysis.icu.segmentation.ScriptIterator.isSameScript(), called from ScriptIterator.next() with the scripts of two consecutive characters; these methods

Re: ICUTokenizer acting very strangely with oriental characters

2014-08-12 Thread Steve Rowe
In the table below, the IsSameS (is same script) and SBreak? (script break = not IsSameS) decisions are based on what I mentioned in my previous message, and the WBreak (word break) decision is based on UAX#29 word break rules: CharCode Point ScriptIsSameS?SBreak? WBreak?

Re: what's the difference between solr and elasticsearch in hdfs case?

2014-08-12 Thread Jianyi
Thanks Erick. I will try. -- View this message in context: http://lucene.472066.n3.nabble.com/what-s-the-difference-between-solr-and-elasticsearch-in-hdfs-case-tp4152413p4152626.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can I use multiple cores

2014-08-12 Thread Ramprasad Padmanabhan
And how many machines running the SOLR ? On 12 August 2014 22:12, Noble Paul noble.p...@gmail.com wrote: The machines were 32GB ram boxes. You must do the RAM requirement And how many machines running the SOLR ? I expect that I will have to add more servers. What I am looking for is how