Can I use multiple cores
I need to store in SOLR all data of my clients mailing activitiy The data contains meta data like From;To:Date;Time:Subject etc I would easily have 1000 Million records every 2 months. What I am currently doing is creating cores per client. So I have 400 cores already. Is this a good idea to do ? What is the general practice for creating cores
Re: When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck
I happens once the server is fully started. And when it gets stuck sometimes I have to restart the server, sometimes I'm able to edit the solrconfig.xml and reload it. Harun Reşit Zafer TÜBİTAK BİLGEM BTE Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü T +90 262 675 3268 W http://www.hrzafer.com On 11.08.2014 17:32, Dyer, James wrote: Harun, Just to clarify, is this happening during startup when a warmup query is running, or is this once the server is fully started? This might be another instance of https://issues.apache.org/jira/browse/SOLR-5386 . James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Harun Reşit Zafer [mailto:harun.za...@tubitak.gov.tr] Sent: Monday, August 11, 2014 8:39 AM To: solr-user@lucene.apache.org Subject: When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck Hi, In the following configuration when uncomment both mm and maxCollationTries lines, and run a query on |/select|, Solr gets stuck with no exception. I tried different values for both parameters and found that values for mm less than %40 still works. |requestHandler name=/select class=solr.SearchHandler !-- default values for query parameters can be specified, these will be overridden by parameters in the request -- lst name=defaults str name=echoParamsexplicit/str str name=defTypeedismax/str int name=timeAllowed1000/int str name=qftitle^3 title_s^2 content/str str name=pftitle content/str str name=flid,title,content,score/str float name=tie0.1/float str name=lowercaseOperatorstrue/str str name=stopwordstrue/str !-- str name=mm75%/str-- int name=rows10/int str name=spellcheckon/str str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count5/str str name=spellcheck.maxResultsForSuggest5/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.alternativeTermCount2/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.maxCollationTries5/str !-- str name=spellcheck.collateParam.mm100%/str-- str name=spellcheck.maxCollations3/str /lst arr name=last-components strspellcheck/str /arr /requestHandler Any idea? Thanks |
Re: Can I use multiple cores
Hi Ramprasad, You can certainly have a system with hundreds of cores. I know of more than a few people who have done that successfully in their setups. At the same time, I'd also recommend to you to have a look at SolrCloud. SolrCloud takes away the operational pains like replication/recovery etc. to a major extent. I don't know about your security requirements and hard bounds on that front but look at routing in SolrCloud to also figure out multi-tenancy implementation here: * SolrCloud Document Routing by Joel: http://searchhub.org/2013/06/13/solr-cloud-document-routing/ * Multi-level composite-id routing in SolrCloud: http://searchhub.org/2014/01/06/10590/ On Mon, Aug 11, 2014 at 11:40 PM, Ramprasad Padmanabhan ramprasad...@gmail.com wrote: I need to store in SOLR all data of my clients mailing activitiy The data contains meta data like From;To:Date;Time:Subject etc I would easily have 1000 Million records every 2 months. What I am currently doing is creating cores per client. So I have 400 cores already. Is this a good idea to do ? What is the general practice for creating cores -- Anshum Gupta http://www.anshumgupta.net
Re: Can I use multiple cores
On Tue, 2014-08-12 at 08:40 +0200, Ramprasad Padmanabhan wrote: I need to store in SOLR all data of my clients mailing activitiy The data contains meta data like From;To:Date;Time:Subject etc I would easily have 1000 Million records every 2 months. If standard searches are always inside a single client's emails and not across all cores, this should scale simply by adding new machines linear to the corpus size. What I am currently doing is creating cores per client. So I have 400 cores already. Is this a good idea to do ? Yes. One core per client ensures than ranking works well. It makes it easy to remove users and if part of the users are inactive for long periods of time, you can use dynamic loading of cores. That is under the presumption that you will have a few thousand clients. If your expected scale is millions, I am not sure it will work. - Toke Eskildsen, State and University Library, Denmark
Re: Can I use multiple cores
I think this question is more aimed at design and performance of large number of cores. Also solr is designed to handle multiple cores effectively, however it would be interesting to know If you have observed any performance problem with growing number of cores, with number of nodes and solr version. Regards Harshvardhan Ojha On Tue, Aug 12, 2014 at 12:33 PM, Anshum Gupta ans...@anshumgupta.net wrote: Hi Ramprasad, You can certainly have a system with hundreds of cores. I know of more than a few people who have done that successfully in their setups. At the same time, I'd also recommend to you to have a look at SolrCloud. SolrCloud takes away the operational pains like replication/recovery etc. to a major extent. I don't know about your security requirements and hard bounds on that front but look at routing in SolrCloud to also figure out multi-tenancy implementation here: * SolrCloud Document Routing by Joel: http://searchhub.org/2013/06/13/solr-cloud-document-routing/ * Multi-level composite-id routing in SolrCloud: http://searchhub.org/2014/01/06/10590/ On Mon, Aug 11, 2014 at 11:40 PM, Ramprasad Padmanabhan ramprasad...@gmail.com wrote: I need to store in SOLR all data of my clients mailing activitiy The data contains meta data like From;To:Date;Time:Subject etc I would easily have 1000 Million records every 2 months. What I am currently doing is creating cores per client. So I have 400 cores already. Is this a good idea to do ? What is the general practice for creating cores -- Anshum Gupta http://www.anshumgupta.net
Re: what's the difference between solr and elasticsearch in hdfs case?
Hi Alex, Thanks for your reply. I'm comparing Solr vs. ElasticSearch. Dose solr store index on hdfs in raw lucene format? I mean, if in that way, we can get the index files from hdfs and directly put them into an application based on lucene. It seems that ElasticSearch dose not store the raw lucene index on hdfs directly. It has its special data structure and operations. -- View this message in context: http://lucene.472066.n3.nabble.com/what-s-the-difference-between-solr-and-elasticsearch-in-hdfs-case-tp4152413p4152450.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck
I tried again to make sure. Server starts, I can see web admin gui but I can't navigate btw tabs. It just says loading. But on the terminal console everything seems normal. Harun Reşit Zafer TÜBİTAK BİLGEM BTE Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü T +90 262 675 3268 W http://www.hrzafer.com On 12.08.2014 09:42, Harun Reşit Zafer wrote: I happens once the server is fully started. And when it gets stuck sometimes I have to restart the server, sometimes I'm able to edit the solrconfig.xml and reload it. Harun Reşit Zafer TÜBİTAK BİLGEM BTE Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü T +90 262 675 3268 W http://www.hrzafer.com On 11.08.2014 17:32, Dyer, James wrote: Harun, Just to clarify, is this happening during startup when a warmup query is running, or is this once the server is fully started? This might be another instance of https://issues.apache.org/jira/browse/SOLR-5386 . James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Harun Reşit Zafer [mailto:harun.za...@tubitak.gov.tr] Sent: Monday, August 11, 2014 8:39 AM To: solr-user@lucene.apache.org Subject: When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck Hi, In the following configuration when uncomment both mm and maxCollationTries lines, and run a query on |/select|, Solr gets stuck with no exception. I tried different values for both parameters and found that values for mm less than %40 still works. |requestHandler name=/select class=solr.SearchHandler !-- default values for query parameters can be specified, these will be overridden by parameters in the request -- lst name=defaults str name=echoParamsexplicit/str str name=defTypeedismax/str int name=timeAllowed1000/int str name=qftitle^3 title_s^2 content/str str name=pftitle content/str str name=flid,title,content,score/str float name=tie0.1/float str name=lowercaseOperatorstrue/str str name=stopwordstrue/str !-- str name=mm75%/str-- int name=rows10/int str name=spellcheckon/str str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count5/str str name=spellcheck.maxResultsForSuggest5/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.alternativeTermCount2/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.maxCollationTries5/str !-- str name=spellcheck.collateParam.mm100%/str-- str name=spellcheck.maxCollations3/str /lst arr name=last-components strspellcheck/str /arr /requestHandler Any idea? Thanks |
Re: SolrCloud OOM Problem
On Tue, 2014-08-12 at 01:27 +0200, dancoleman wrote: My SolrCloud of 3 shard / 3 replicas is having a lot of OOM errors. Here are some specs on my setup: hosts: all are EC2 m1.large with 250G data volumes Is that 3 (each running a primary and a replica shard) or 6 instances? documents: 120M total zookeeper: 5 external t1.micros If your facet fields has many unique values and if you have many concurrent requests, then memory usage will be high. But by the looks of it, I guess that the facets fields has relatively few values? Still, if you have many concurrent queries, you might consider using a queue in front of your SolrCloud instead of just starting new requests, in order to set an effective limit on heap usage. - Toke Eskildsen, State and University Library, Denmark
Re: Can I use multiple cores
Are there documented benchmarks with number of cores As of now I just have a test bed. We have 150 million records ( will go up to 1000 M ) , distributed in 400 cores. A single machine 16GB RAM + 16 cores search is working fine But I still am not sure will this work fine in production Obviously I can always add more nodes to solr, but I need to justify how much I need. On 12 August 2014 12:48, Harshvardhan Ojha ojha.harshvard...@gmail.com wrote: I think this question is more aimed at design and performance of large number of cores. Also solr is designed to handle multiple cores effectively, however it would be interesting to know If you have observed any performance problem with growing number of cores, with number of nodes and solr version. Regards Harshvardhan Ojha On Tue, Aug 12, 2014 at 12:33 PM, Anshum Gupta ans...@anshumgupta.net wrote: Hi Ramprasad, You can certainly have a system with hundreds of cores. I know of more than a few people who have done that successfully in their setups. At the same time, I'd also recommend to you to have a look at SolrCloud. SolrCloud takes away the operational pains like replication/recovery etc. to a major extent. I don't know about your security requirements and hard bounds on that front but look at routing in SolrCloud to also figure out multi-tenancy implementation here: * SolrCloud Document Routing by Joel: http://searchhub.org/2013/06/13/solr-cloud-document-routing/ * Multi-level composite-id routing in SolrCloud: http://searchhub.org/2014/01/06/10590/ On Mon, Aug 11, 2014 at 11:40 PM, Ramprasad Padmanabhan ramprasad...@gmail.com wrote: I need to store in SOLR all data of my clients mailing activitiy The data contains meta data like From;To:Date;Time:Subject etc I would easily have 1000 Million records every 2 months. What I am currently doing is creating cores per client. So I have 400 cores already. Is this a good idea to do ? What is the general practice for creating cores -- Anshum Gupta http://www.anshumgupta.net
Re: Help Required
Hi, is http://wiki.apache.org/solr/Support page immutable? Dmitry On Fri, Aug 8, 2014 at 4:24 PM, Jack Krupansky j...@basetechnology.com wrote: And the Solr Support list is where people register their available consulting services: http://wiki.apache.org/solr/Support -- Jack Krupansky -Original Message- From: Alexandre Rafalovitch Sent: Friday, August 8, 2014 9:12 AM To: solr-user Subject: Re: Help Required We don't mediate jobs offers/positions on this list. We help people to learn how to make these kinds of things yourself. If you are a developer, you may find that it would take only several days to get a strong feel for Solr. Especially, if you start from tutorials/right books. To find developers, using the normal job boards would probably be more efficient. That way you can list location, salary, timelines, etc. Regards, Alex. P.s. CityPantry does not actually seem to do what you are asking. They are starting from postcode, though possibly use the geodistance sorting afterwards. P.p.s. Yes, Solr can help with distance-based sorting. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Fri, Aug 8, 2014 at 11:36 AM, INGRID MARSH ingridma...@btinternet.com wrote: Dear Sirs, I wonder if you can help me? I'm looking for a developer who uses Solr to build for me a facted seach facilty using location. In a nutshell, I need this funtionality as in here: www.citypantry.com wwwdinein. Here the vendor via google maps enters the area/radius they cover which enable the user to enter their postcode and be presented with the users who serve/cover their area. Is this what solr does? can you put me in touch with small developers who can help? Thanks so much. Ingrid Marsh -- Dmitry Kan Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Modifying date format when using TrieDateField.
Hi, I have a TrieDateField where I want to store a date in -MM-dd format as my source contains the date in same format. As I understand TrieDateField stores date in -MM-dd'T'HH:mm:ss format hence the date is getting formatted to the same. Kindly let me know: How can I change the date format during indexing when using TrieDateField? How I can stop the date modification due to time zone? E.g. My 1972-07-03 date is getting changed to 1972-07-03T18:30:00Z when using TrieDateField. Thanks, Modassar
Re: Modifying date format when using TrieDateField.
Use the parse date update request processor: http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html Additional examples are in my e-book: http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html -- Jack Krupansky -Original Message- From: Modassar Ather Sent: Tuesday, August 12, 2014 7:24 AM To: solr-user@lucene.apache.org Subject: Modifying date format when using TrieDateField. Hi, I have a TrieDateField where I want to store a date in -MM-dd format as my source contains the date in same format. As I understand TrieDateField stores date in -MM-dd'T'HH:mm:ss format hence the date is getting formatted to the same. Kindly let me know: How can I change the date format during indexing when using TrieDateField? How I can stop the date modification due to time zone? E.g. My 1972-07-03 date is getting changed to 1972-07-03T18:30:00Z when using TrieDateField. Thanks, Modassar
Re: Can I use multiple cores
On Tue, 2014-08-12 at 11:50 +0200, Ramprasad Padmanabhan wrote: Are there documented benchmarks with number of cores As of now I just have a test bed. We have 150 million records ( will go up to 1000 M ) , distributed in 400 cores. A single machine 16GB RAM + 16 cores search is working fine About 6M records for a single machine. That is not a lot. What is a typical query rate for a core? I would guess that the CPU is idle most of the time and that you could serve quite a lot more cores from a single machine by increasing RAM or using SSDs (if you are not doing so already). How large is a typical core in GB? But I still am not sure will this work fine in production 16 cores is not many for a single machine and since you can direct any search to a single core, you can scale up forever. What is it you are worried about? Obviously I can always add more nodes to solr, but I need to justify how much I need. Are you worried about cost? - Toke Eskildsen, State and University Library, Denmark
Re: Can I use multiple cores
Sorry for missing information. My solr-cores take less than 200MB of disk What I am worried about is If I run too many cores from a single solr machine there will be a limit to the number of concurrent searches it can support. I am still benchmarking for this. Also another major bottleneck I find is adding data to solr. I have a cron job that picks data from Mysql Live DB and adds to solr. If I run each core addition serially it works , but If try a multiprocessed system then this addition simply hangs. Even if all processes are talking to different cores. This means beyond some point my insertion will take too long and I will have to have multiple servers. Too bad because actually there is no problem with data search , only with data add
Re: Can I use multiple cores
On Tue, 2014-08-12 at 14:14 +0200, Ramprasad Padmanabhan wrote: Sorry for missing information. My solr-cores take less than 200MB of disk So ~3GB/server. If you do not have special heavy queries, high query rate or heavy requirements for index availability, that really sounds like you could put a lot more cores on each machine. What I am worried about is If I run too many cores from a single solr machine there will be a limit to the number of concurrent searches it can support. I am still benchmarking for this. By all means, benchmark! Try to pinpoint what limits the amount of concurrent searches: CPU or IO? I have a cron job that picks data from Mysql Live DB and adds to solr. If I run each core addition serially it works , but If try a multiprocessed system then this addition simply hangs. Even if all processes are talking to different cores. Are you sure the problem is in the Solr end? Have you tried running the multithreaded extraction without adding the data to Solr? - Toke Eskildsen, State and University Library, Denmark
Re: Modifying date format when using TrieDateField.
Hi Jack, Thanks for your suggestion. I think the way I am using the ParseDateFieldUpdateProcessorFactory is not right hence the date is not getting transformed to the desired format. I added following in solrconfig.xml and see no effect in search result. The date is still in -MM-dd'T'HH:mm:ss format. processor class=solr.ParseDateFieldUpdateProcessorFactory arr name=format str-MM-dd/str /arr /processor I have following field defined in schema.xml. Kindly provide an example to configure it under solrconfig.xml to get the date changed to desired format. fieldType name=tdate class=solr.TrieDateField precisionStep=0 positionIncrementGap=0/ Also please let me know if I am missing anything in the configuration. Thanks, Modassar On Tue, Aug 12, 2014 at 5:05 PM, Jack Krupansky j...@basetechnology.com wrote: Use the parse date update request processor: http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/ solr/update/processor/ParseDateFieldUpdateProcessorFactory.html Additional examples are in my e-book: http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x- deep-dive-early-access-release-7/ebook/product-21203548.html -- Jack Krupansky -Original Message- From: Modassar Ather Sent: Tuesday, August 12, 2014 7:24 AM To: solr-user@lucene.apache.org Subject: Modifying date format when using TrieDateField. Hi, I have a TrieDateField where I want to store a date in -MM-dd format as my source contains the date in same format. As I understand TrieDateField stores date in -MM-dd'T'HH:mm:ss format hence the date is getting formatted to the same. Kindly let me know: How can I change the date format during indexing when using TrieDateField? How I can stop the date modification due to time zone? E.g. My 1972-07-03 date is getting changed to 1972-07-03T18:30:00Z when using TrieDateField. Thanks, Modassar
Re: Can I use multiple cores
Hi Ramprasad, I have used it in a cluster with millions of users (1 user per core) in legacy cloud mode .We used the on demand core loading feature where each Solr had 30,000 cores and at a time only 2000 cores were in memory. You are just hitting 400 and I don't see much of a problem . What is your h/w bTW? On Tue, Aug 12, 2014 at 12:10 PM, Ramprasad Padmanabhan ramprasad...@gmail.com wrote: I need to store in SOLR all data of my clients mailing activitiy The data contains meta data like From;To:Date;Time:Subject etc I would easily have 1000 Million records every 2 months. What I am currently doing is creating cores per client. So I have 400 cores already. Is this a good idea to do ? What is the general practice for creating cores -- - Noble Paul
Re: Can I use multiple cores
Hi Paul and Ramprasad, I follow your discussion with interest as I will have more or less the same requirement. When you say that you use on demand core loading, are you talking about LotsOfCore stuff? Erick told me that it does not work very well in a distributed environnement. How do you handle this problem? Do you use multiple single Solr instances? What about failover? Thanks for your answer, Aurelien Le 12/08/2014 14:48, Noble Paul a écrit : Hi Ramprasad, I have used it in a cluster with millions of users (1 user per core) in legacy cloud mode .We used the on demand core loading feature where each Solr had 30,000 cores and at a time only 2000 cores were in memory. You are just hitting 400 and I don't see much of a problem . What is your h/w bTW? On Tue, Aug 12, 2014 at 12:10 PM, Ramprasad Padmanabhan ramprasad...@gmail.com wrote: I need to store in SOLR all data of my clients mailing activitiy The data contains meta data like From;To:Date;Time:Subject etc I would easily have 1000 Million records every 2 months. What I am currently doing is creating cores per client. So I have 400 cores already. Is this a good idea to do ? What is the general practice for creating cores
RE: When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck
Harun, What do you mean by the terminal console? Do you mean to say the admin gui freezes but you can still issue queries to solr directly through your browser? James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Harun Reşit Zafer [mailto:harun.za...@tubitak.gov.tr] Sent: Tuesday, August 12, 2014 2:46 AM To: solr-user@lucene.apache.org Subject: Re: When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck I tried again to make sure. Server starts, I can see web admin gui but I can't navigate btw tabs. It just says loading. But on the terminal console everything seems normal. Harun Reşit Zafer TÜBİTAK BİLGEM BTE Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü T +90 262 675 3268 W http://www.hrzafer.com On 12.08.2014 09:42, Harun Reşit Zafer wrote: I happens once the server is fully started. And when it gets stuck sometimes I have to restart the server, sometimes I'm able to edit the solrconfig.xml and reload it. Harun Reşit Zafer TÜBİTAK BİLGEM BTE Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü T +90 262 675 3268 W http://www.hrzafer.com On 11.08.2014 17:32, Dyer, James wrote: Harun, Just to clarify, is this happening during startup when a warmup query is running, or is this once the server is fully started? This might be another instance of https://issues.apache.org/jira/browse/SOLR-5386 . James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Harun Reşit Zafer [mailto:harun.za...@tubitak.gov.tr] Sent: Monday, August 11, 2014 8:39 AM To: solr-user@lucene.apache.org Subject: When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck Hi, In the following configuration when uncomment both mm and maxCollationTries lines, and run a query on |/select|, Solr gets stuck with no exception. I tried different values for both parameters and found that values for mm less than %40 still works. |requestHandler name=/select class=solr.SearchHandler !-- default values for query parameters can be specified, these will be overridden by parameters in the request -- lst name=defaults str name=echoParamsexplicit/str str name=defTypeedismax/str int name=timeAllowed1000/int str name=qftitle^3 title_s^2 content/str str name=pftitle content/str str name=flid,title,content,score/str float name=tie0.1/float str name=lowercaseOperatorstrue/str str name=stopwordstrue/str !-- str name=mm75%/str-- int name=rows10/int str name=spellcheckon/str str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count5/str str name=spellcheck.maxResultsForSuggest5/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.alternativeTermCount2/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.maxCollationTries5/str !-- str name=spellcheck.collateParam.mm100%/str-- str name=spellcheck.maxCollations3/str /lst arr name=last-components strspellcheck/str /arr /requestHandler Any idea? Thanks |
Re: Can I use multiple cores
On 12 August 2014 18:18, Noble Paul noble.p...@gmail.com wrote: Hi Ramprasad, I have used it in a cluster with millions of users (1 user per core) in legacy cloud mode .We used the on demand core loading feature where each Solr had 30,000 cores and at a time only 2000 cores were in memory. You are just hitting 400 and I don't see much of a problem . What is your h/w bTW? On Tue, Aug 12, 2014 at 12:10 PM, Ramprasad Padmanabhan ramprasad...@gmail.com wrote: I need to store in SOLR all data of my clients mailing activitiy The data contains meta data like From;To:Date;Time:Subject etc I would easily have 1000 Million records every 2 months. What I am currently doing is creating cores per client. So I have 400 cores already. Is this a good idea to do ? What is the general practice for creating cores I have a single machine 16GB Ram with 16 cpu cores What is the h/w you are using
Updates to index not available immediately as index scales, even with autoSoftCommit at 1 second
I've been trying to debug through this but I'm stumped. I have a Solr index with ~40 million documents indexed currently sitting idle. I update an existing document through the web interface (collection1 - Documents - /update) and the web request returns successfully. At this point, I expect the document to be updated on future searches within 1 second, but that's not the case. The document can sometimes not be updated to future searches for several minutes. What could be causing this, and how can it be remedied? Within my solrconfig.xml, I have the following commit properties set: autoSoftCommit maxTime1000/maxTime /autoSoftCommit autoCommit maxTime30/maxTime openSearcherfalse/openSearcher /autoCommit Running an identical Solr configuration but with thousands of documents (rather than tens of millions), the updates are available immediately. -- View this message in context: http://lucene.472066.n3.nabble.com/Updates-to-index-not-available-immediately-as-index-scales-even-with-autoSoftCommit-at-1-second-tp4152511.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Help Required
On 8/12/2014 3:57 AM, Dmitry Kan wrote: Hi, is http://wiki.apache.org/solr/Support page immutable? All pages on that wiki are changeable by end users. You just need to create an account on the wiki and then ask on this list to have your wiki username added to the Contributor group. Thanks, Shawn
RE: Can I use multiple cores
Ramprasad Padmanabhan [ramprasad...@gmail.com] wrote: I have a single machine 16GB Ram with 16 cpu cores Ah! I thought you had more machines, each with 16 Solr cores. This changes a lot. 400 Solr cores of ~200MB ~= 80GB of data. You're aiming for 7 times that, so about 500GB of data. Running that on a single machine with 16GB of RAM is not unrealistic, but it depends a lot on how often a search is issued and whether or not you can unload inactive cores and accept the startup penalty of loading it the first time a user searches for something. Searches will be really slow if you are using a spinning drive. You might be interested in http://sbdevel.wordpress.com/2013/06/06/memory-is-overrated/ As for indexing then I can understand if you run into problems with 400 concurrent updates to your single machine setup. You should limit the amount of concurrent updates to a bit more than the number of cores, so try with 20 or 40. - Toke Eskildsen
Re: Modifying date format when using TrieDateField.
The response will always be the full specification, so you'll have -MM-dd'T'HH:mm:ss format. If you want the user to just see the -MM-dd you could use a DocTransformer to change it on the way out. You cannot change the way the dates are stored internally. The DateTransformer is just there to allow different inputs, it has no effect on the stored data at all. Best, Erick On Tue, Aug 12, 2014 at 5:33 AM, Modassar Ather modather1...@gmail.com wrote: Hi Jack, Thanks for your suggestion. I think the way I am using the ParseDateFieldUpdateProcessorFactory is not right hence the date is not getting transformed to the desired format. I added following in solrconfig.xml and see no effect in search result. The date is still in -MM-dd'T'HH:mm:ss format. processor class=solr.ParseDateFieldUpdateProcessorFactory arr name=format str-MM-dd/str /arr /processor I have following field defined in schema.xml. Kindly provide an example to configure it under solrconfig.xml to get the date changed to desired format. fieldType name=tdate class=solr.TrieDateField precisionStep=0 positionIncrementGap=0/ Also please let me know if I am missing anything in the configuration. Thanks, Modassar On Tue, Aug 12, 2014 at 5:05 PM, Jack Krupansky j...@basetechnology.com wrote: Use the parse date update request processor: http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/ solr/update/processor/ParseDateFieldUpdateProcessorFactory.html Additional examples are in my e-book: http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x- deep-dive-early-access-release-7/ebook/product-21203548.html -- Jack Krupansky -Original Message- From: Modassar Ather Sent: Tuesday, August 12, 2014 7:24 AM To: solr-user@lucene.apache.org Subject: Modifying date format when using TrieDateField. Hi, I have a TrieDateField where I want to store a date in -MM-dd format as my source contains the date in same format. As I understand TrieDateField stores date in -MM-dd'T'HH:mm:ss format hence the date is getting formatted to the same. Kindly let me know: How can I change the date format during indexing when using TrieDateField? How I can stop the date modification due to time zone? E.g. My 1972-07-03 date is getting changed to 1972-07-03T18:30:00Z when using TrieDateField. Thanks, Modassar
Re: Updates to index not available immediately as index scales, even with autoSoftCommit at 1 second
You havne't given us a lot of information to go on (ie: full solrconfig.xml, log messages arround the tim of your update, etc...) but my best guess would be that you are seeing a delay between the time the new searcher is opened and the time the newSearcher is made available to requests due to cache warming. the specifics of your cache configs and newSearcher event listeners would impact this ... and of course, you'd see log messages about opening hte searcher, the cache warming, etc : Date: Tue, 12 Aug 2014 07:18:20 -0700 (PDT) : From: cwhit cwhi...@solinkcorp.com : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Updates to index not available immediately as index scales, : even with autoSoftCommit at 1 second : : I've been trying to debug through this but I'm stumped. I have a Solr index : with ~40 million documents indexed currently sitting idle. I update an : existing document through the web interface (collection1 - Documents - : /update) and the web request returns successfully. At this point, I expect : the document to be updated on future searches within 1 second, but that's : not the case. The document can sometimes not be updated to future searches : for several minutes. What could be causing this, and how can it be : remedied? : : Within my solrconfig.xml, I have the following commit properties set: : : autoSoftCommit : maxTime1000/maxTime : /autoSoftCommit : : autoCommit : maxTime30/maxTime : openSearcherfalse/openSearcher : /autoCommit : : Running an identical Solr configuration but with thousands of documents : (rather than tens of millions), the updates are available immediately. : : : : : -- : View this message in context: http://lucene.472066.n3.nabble.com/Updates-to-index-not-available-immediately-as-index-scales-even-with-autoSoftCommit-at-1-second-tp4152511.html : Sent from the Solr - User mailing list archive at Nabble.com. : -Hoss http://www.lucidworks.com/
Re: Updates to index not available immediately as index scales, even with autoSoftCommit at 1 second
I'm not seeing any messages in the log with respect to cache warming at the time, but I will investigate that possibility. Thank you. In case it is helpful, I pasted the entire solrconfig.xml at http://pastebin.com/C0iQ7E9a -- View this message in context: http://lucene.472066.n3.nabble.com/Updates-to-index-not-available-immediately-as-index-scales-even-with-autoSoftCommit-at-1-second-tp4152511p4152545.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Updates to index not available immediately as index scales, even with autoSoftCommit at 1 second
: I'm not seeing any messages in the log with respect to cache warming at the : time, but I will investigate that possibility. Thank you. In case it is what logs *do* you see at the time you send the doc? w/o details, we can't help you. : helpful, I pasted the entire solrconfig.xml at http://pastebin.com/C0iQ7E9a -Hoss http://www.lucidworks.com/
Re: Can I use multiple cores
The machines were 32GB ram boxes. You must do the RAM requirement calculation for your indexes . Just the no:of indexes alone won't be enough to arrive at the RAM requirement On Tue, Aug 12, 2014 at 6:59 PM, Ramprasad Padmanabhan ramprasad...@gmail.com wrote: On 12 August 2014 18:18, Noble Paul noble.p...@gmail.com wrote: Hi Ramprasad, I have used it in a cluster with millions of users (1 user per core) in legacy cloud mode .We used the on demand core loading feature where each Solr had 30,000 cores and at a time only 2000 cores were in memory. You are just hitting 400 and I don't see much of a problem . What is your h/w bTW? On Tue, Aug 12, 2014 at 12:10 PM, Ramprasad Padmanabhan ramprasad...@gmail.com wrote: I need to store in SOLR all data of my clients mailing activitiy The data contains meta data like From;To:Date;Time:Subject etc I would easily have 1000 Million records every 2 months. What I am currently doing is creating cores per client. So I have 400 cores already. Is this a good idea to do ? What is the general practice for creating cores I have a single machine 16GB Ram with 16 cpu cores What is the h/w you are using -- - Noble Paul
Re: Updates to index not available immediately as index scales, even with autoSoftCommit at 1 second
Immediately after triggering the update, this is what is in the logs: /2014-08-12 12:58:48,774 | [71] | 153414367 [qtp2038499066-4772] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update params={wt=json} {add=[52627624 (1476251068652322816)]} 0 34 2014-08-12 12:58:49,773 | [71] | 153415369 [commitScheduler-7-thread-1] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} 2014-08-12 12:58:49,862 | [71] | 153415459 [commitScheduler-7-thread-1] INFO org.apache.solr.search.SolrIndexSearcher – Opening Searcher@65c48c06 main 2014-08-12 12:58:49,874 | [71] | 153415472 [commitScheduler-7-thread-1] INFO org.apache.solr.update.UpdateHandler – end_commit_flush/ The end_commit_flush leads me to believe that the soft commit has completed, but perhaps that thought is wrong. There are no other logs for a while, until / 2014-08-12 13:03:49,556 | [71] | 153715147 [commitScheduler-6-thread-1] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} 2014-08-12 13:03:49,805 | [71] | 153715402 [commitScheduler-6-thread-1] INFO org.apache.solr.core.SolrCore – SolrDeletionPolicy.onCommit: commits: num=2 2014-08-12 13:03:49,805 | [71] | commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@E:\Program Files (x86)\SolrLive\SolrFiles\Solr\service\solr\data\index lockFactory=org.apache.lucene.store.NativeFSLockFactory@1fac1a3c; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_2we,generation=3758} 2014-08-12 13:03:49,805 | [71] | commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@E:\Program Files (x86)\SolrLive\SolrFiles\Solr\service\solr\data\index lockFactory=org.apache.lucene.store.NativeFSLockFactory@1fac1a3c; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_2wf,generation=3759} 2014-08-12 13:03:49,805 | [34] | 153715403 [commitScheduler-6-thread-1] INFO org.apache.solr.core.SolrCore – newest commit generation = 3759 2014-08-12 13:03:49,818 | [34] | 153715415 [commitScheduler-6-thread-1] INFO org.apache.solr.update.UpdateHandler – end_commit_flush / At this point, the update is still not present... /2014-08-12 13:11:45,279 | [81] | 154190876 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore – QuerySenderListener sending requests to Searcher@65c48c06 main{StandardDirectoryReader(segments_2we:82217:nrt _qkc(4.6):C8161558/879724:delGen=275 _sra(4.6):C2943436/247953:delGen=51 _r2w(4.6):C1149753/18376:delGen=55 _rgs(4.6):C1468449/648612:delGen=107 _tdl(4.6):C583431/7873:delGen=94 _svo(4.6):C197286/7:delGen=5 _t4d(4.6):C247031/2928:delGen=36 _tkf(4.6):C111429/761:delGen=23 _tch(4.6):C6014/81:delGen=22 _tk5(4.6):C3907/242:delGen=21 _tjv(4.6):C3492/119:delGen=13 _thd(4.6):C5014/241:delGen=24 _tdh(4.6):C5375/437:delGen=30 _tj1(4.6):C5989/15:delGen=6 _tkq(4.6):C1749/36:delGen=6 _tmj(4.6):C961/1:delGen=1 _tlm(4.6):C714/9:delGen=5 _tm6(4.6):C2616 _tlx(4.6):C1105/273:delGen=3 _tly(4.6):C5/2:delGen=1 _tm2(4.6):C1 _tm4(4.6):C1 _tmb(4.6):C1 _tmk(4.6):C5 _tml(4.6):C12 _tmm(4.6):C1 _tmn(4.6):C2/1:delGen=1 _tmo(4.6):C1 _tmp(4.6):C1 _tmr(4.6):C1 _tms(4.6):C1)} 2014-08-12 13:11:45,280 | [81] | 154190877 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore – QuerySenderListener done. 2014-08-12 13:11:45,280 | [81] | 154190877 [searcherExecutor-4-thread-1] INFO org.apache.solr.handler.component.SpellCheckComponent – Building spell index for spell checker: suggest 2014-08-12 13:11:45,280 | [81] | 154190877 [searcherExecutor-4-thread-1] INFO org.apache.solr.spelling.suggest.Suggester – build()/ Still no update... /2014-08-12 13:12:58,424 | [81] | 154264021 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore – [collection1] Registered new searcher Searcher@65c48c06 main{StandardDirectoryReader(segments_2we:82217:nrt _qkc(4.6):C8161558/879724:delGen=275 _sra(4.6):C2943436/247953:delGen=51 _r2w(4.6):C1149753/18376:delGen=55 _rgs(4.6):C1468449/648612:delGen=107 _tdl(4.6):C583431/7873:delGen=94 _svo(4.6):C197286/7:delGen=5 _t4d(4.6):C247031/2928:delGen=36 _tkf(4.6):C111429/761:delGen=23 _tch(4.6):C6014/81:delGen=22 _tk5(4.6):C3907/242:delGen=21 _tjv(4.6):C3492/119:delGen=13 _thd(4.6):C5014/241:delGen=24 _tdh(4.6):C5375/437:delGen=30 _tj1(4.6):C5989/15:delGen=6 _tkq(4.6):C1749/36:delGen=6 _tmj(4.6):C961/1:delGen=1 _tlm(4.6):C714/9:delGen=5 _tm6(4.6):C2616 _tlx(4.6):C1105/273:delGen=3 _tly(4.6):C5/2:delGen=1 _tm2(4.6):C1 _tm4(4.6):C1 _tmb(4.6):C1 _tmk(4.6):C5 _tml(4.6):C12 _tmm(4.6):C1 _tmn(4.6):C2/1:delGen=1 _tmo(4.6):C1 _tmp(4.6):C1 _tmr(4.6):C1 _tms(4.6):C1)}/ There we go! Finally an update! Almost 15 minutes after making the update, it is visible to queries. -- View this message in context:
Re: what's the difference between solr and elasticsearch in hdfs case?
I just pinged someone who really knows this stuff and the reply is that he's copied the index from HDFS to a local file system in order to inspect it with Luke, which means the bits on disk are identical and may freely be copied back and forth. So I'd say go for it. Erick On Tue, Aug 12, 2014 at 12:28 AM, Jianyi phoenix.w.2...@qq.com wrote: Hi Alex, Thanks for your reply. I'm comparing Solr vs. ElasticSearch. Dose solr store index on hdfs in raw lucene format? I mean, if in that way, we can get the index files from hdfs and directly put them into an application based on lucene. It seems that ElasticSearch dose not store the raw lucene index on hdfs directly. It has its special data structure and operations. -- View this message in context: http://lucene.472066.n3.nabble.com/what-s-the-difference-between-solr-and-elasticsearch-in-hdfs-case-tp4152413p4152450.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Updates to index not available immediately as index scales, even with autoSoftCommit at 1 second
Based on your solrconfig.xml settings for the filter and queryResult caches, I believe Chris's initial guess is correct. After a commit, there is likely plenty of time spent warming these caches due to the significantly high autowarm counts. filterCache class=solr.FastLRUCache size=16384 initialSize=4096 autowarmCount=4096/ queryResultCache class=solr.FastLRUCache size=8192 initialSize=8192 autowarmCount=2048/ Suggest you try setting the autowarmcount very low or to zero, and then testing to confirm the problem. You might want to monitor if any JVM garbage collections are occurring during this time, and causing system pauses. With such large caches (nominally stored in Old Gen) you may be setting yourself up for GCs that take a significant amount of time and thus add to your delay. Matt -Original Message- From: cwhit [mailto:cwhi...@solinkcorp.com] Sent: Tuesday, August 12, 2014 11:18 AM To: solr-user@lucene.apache.org Subject: Re: Updates to index not available immediately as index scales, even with autoSoftCommit at 1 second Immediately after triggering the update, this is what is in the logs: /2014-08-12 12:58:48,774 | [71] | 153414367 [qtp2038499066-4772] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update params={wt=json} {add=[52627624 (1476251068652322816)]} 0 34 2014-08-12 12:58:49,773 | [71] | 153415369 [commitScheduler-7-thread-1] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} 2014-08-12 12:58:49,862 | [71] | 153415459 [commitScheduler-7-thread-1] INFO org.apache.solr.search.SolrIndexSearcher – Opening Searcher@65c48c06 main 2014-08-12 12:58:49,874 | [71] | 153415472 [commitScheduler-7-thread-1] INFO org.apache.solr.update.UpdateHandler – end_commit_flush/ The end_commit_flush leads me to believe that the soft commit has completed, but perhaps that thought is wrong. There are no other logs for a while, until / 2014-08-12 13:03:49,556 | [71] | 153715147 [commitScheduler-6-thread-1] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} 2014-08-12 13:03:49,805 | [71] | 153715402 [commitScheduler-6-thread-1] INFO org.apache.solr.core.SolrCore – SolrDeletionPolicy.onCommit: commits: num=2 2014-08-12 13:03:49,805 | [71] | commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@E:\Program Files (x86)\SolrLive\SolrFiles\Solr\service\solr\data\index lockFactory=org.apache.lucene.store.NativeFSLockFactory@1fac1a3c; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_2we,generation=3758} 2014-08-12 13:03:49,805 | [71] | commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@E:\Program Files (x86)\SolrLive\SolrFiles\Solr\service\solr\data\index lockFactory=org.apache.lucene.store.NativeFSLockFactory@1fac1a3c; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_2wf,generation=3759} 2014-08-12 13:03:49,805 | [34] | 153715403 [commitScheduler-6-thread-1] INFO org.apache.solr.core.SolrCore – newest commit generation = 3759 2014-08-12 13:03:49,818 | [34] | 153715415 [commitScheduler-6-thread-1] INFO org.apache.solr.update.UpdateHandler – end_commit_flush / At this point, the update is still not present... /2014-08-12 13:11:45,279 | [81] | 154190876 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore – QuerySenderListener sending requests to Searcher@65c48c06 main{StandardDirectoryReader(segments_2we:82217:nrt _qkc(4.6):C8161558/879724:delGen=275 _sra(4.6):C2943436/247953:delGen=51 _r2w(4.6):C1149753/18376:delGen=55 _rgs(4.6):C1468449/648612:delGen=107 _tdl(4.6):C583431/7873:delGen=94 _svo(4.6):C197286/7:delGen=5 _t4d(4.6):C247031/2928:delGen=36 _tkf(4.6):C111429/761:delGen=23 _tch(4.6):C6014/81:delGen=22 _tk5(4.6):C3907/242:delGen=21 _tjv(4.6):C3492/119:delGen=13 _thd(4.6):C5014/241:delGen=24 _tdh(4.6):C5375/437:delGen=30 _tj1(4.6):C5989/15:delGen=6 _tkq(4.6):C1749/36:delGen=6 _tmj(4.6):C961/1:delGen=1 _tlm(4.6):C714/9:delGen=5 _tm6(4.6):C2616 _tlx(4.6):C1105/273:delGen=3 _tly(4.6):C5/2:delGen=1 _tm2(4.6):C1 _tm4(4.6):C1 _tmb(4.6):C1 _tmk(4.6):C5 _tml(4.6):C12 _tmm(4.6):C1 _tmn(4.6):C2/1:delGen=1 _tmo(4.6):C1 _tmp(4.6):C1 _tmr(4.6):C1 _tms(4.6):C1)} 2014-08-12 13:11:45,280 | [81] | 154190877 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore – QuerySenderListener done. 2014-08-12 13:11:45,280 | [81] | 154190877 [searcherExecutor-4-thread-1] INFO org.apache.solr.handler.component.SpellCheckComponent – Building spell index for spell checker: suggest 2014-08-12 13:11:45,280 | [81] | 154190877 [searcherExecutor-4-thread-1] INFO org.apache.solr.spelling.suggest.Suggester – build()/ Still no
Re: SolrCloud OOM Problem
I have modified my instances to m2.4xlarge 64-bit with 68.4G memory. Hate to ask this but can you recommend Java memory and GC settings for 90G data and the above memory? Currently I have CATALINA_OPTS=${CATALINA_OPTS} -XX:NewSize=1536m -XX:MaxNewSize=1536m -Xms5120m -Xmx5120m -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+UseConcMarkSweepGC Doesn't this mean I am starting with 5G and never going over 5G? I've seen a few of those univerted multi-valued field OOMs already on the upgraded host. Thanks Tux -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Problem-tp4152389p4152585.html Sent from the Solr - User mailing list archive at Nabble.com.
Access request
Hello, Please provide me access. User id vzhovtyuk My email vzhovt...@gmail.com Wiki user 'Vitaliy Zhovtyuk'
ICUTokenizer acting very strangely with oriental characters
The field value is this: 20世紀の100人;ポートレートアーカイブス;政治家・軍人;政治家・指導 者・軍人;[政 治],100peopeof20century,pploftwentycentury,pploftwentycentury The problem: We can't match this field with a search for 100peopeof20century. The analysis shows that there are three terms indexed at the critical point by ICUTokenizerFactory: 治, 100, and peopeof20century. The 'script' value for the 100 term is Chinese/Japanese instead of Latin. Adding a space before 100 doesn't make any difference in the analysis. This seems like a bug. Can anyone confirm? This is the fieldType being used: fieldType name=keyText class=solr.TextField sortMissingLast=true omitNorms=true positionIncrementGap=0 analyzer type=index !-- remove spaces among hangul and han caracters if there is at least one hangul character -- !-- a korean char guaranteed at the start of the pattern: pattern=(\p{Hangul}\p{Han}*)\s+(?=[\p{Hangul}\p{Han}]) -- charFilter class=solr.PatternReplaceCharFilterFactory pattern=([\p{InHangul_Jamo}\p{InHangul_Compatibility_Jamo}\p{InHangul_Syllables}][\p{InBopomofo}\p{InBopomofo_Extended}\p{InCJK_Compatibility}\p{InCJK_Compatibility_Forms}\p{InCJK_Compatibility_Ideographs}\p{InCJK_Compatibility_Ideographs_Supplement}\p{InCJK_Radicals_Supplement}\p{InCJK_Symbols_And_Punctuation}\p{InCJK_Unified_Ideographs}\p{InCJK_Unified_Ideographs_Extension_A}\p{InCJK_Unified_Ideographs_Extension_B}\p{InKangxi_Radicals}\p{InHalfwidth_And_Fullwidth_Forms}\p{InIdeographic_Description_Characters}]*)\s+(?=[\p{InHangul_Jamo}\p{InHangul_Compatibility_Jamo}\p{InHangul_Syllables}\p{InBopomofo}\p{InBopomofo_Extended}\p{InCJK_Compatibility}\p{InCJK_Compatibility_Forms}\p{InCJK_Compatibility_Ideographs}\p{InCJK_Compatibility_Ideographs_Supplement}\p{InCJK_Radicals_Supplement}\p{InCJK_Symbols_And_Punctuation}\p{InCJK_Unified_Ideographs}\p{InCJK_Unified_Ideographs_Extension_A}\p{InCJK_Unified_Ideographs_Extension_B}\p{InKangxi_Radicals}\p{InHalfwidth_And_Fullwidth_Forms}\p{InIdeographic_Description_Characters}]) replacement=$1/ !-- a korean char guaranteed at the end of the pattern: pattern=([\p{Hangul}\p{Han}])\s+(?=[\p{Han}\s]*\p{Hangul}) -- charFilter class=solr.PatternReplaceCharFilterFactory pattern=([\p{InHangul_Jamo}\p{InHangul_Compatibility_Jamo}\p{InHangul_Syllables}\p{InBopomofo}\p{InBopomofo_Extended}\p{InCJK_Compatibility}\p{InCJK_Compatibility_Forms}\p{InCJK_Compatibility_Ideographs}\p{InCJK_Compatibility_Ideographs_Supplement}\p{InCJK_Radicals_Supplement}\p{InCJK_Symbols_And_Punctuation}\p{InCJK_Unified_Ideographs}\p{InCJK_Unified_Ideographs_Extension_A}\p{InCJK_Unified_Ideographs_Extension_B}\p{InKangxi_Radicals}\p{InHalfwidth_And_Fullwidth_Forms}\p{InIdeographic_Description_Characters}])\s+(?=[\p{InBopomofo}\p{InBopomofo_Extended}\p{InCJK_Compatibility}\p{InCJK_Compatibility_Forms}\p{InCJK_Compatibility_Ideographs}\p{InCJK_Compatibility_Ideographs_Supplement}\p{InCJK_Radicals_Supplement}\p{InCJK_Symbols_And_Punctuation}\p{InCJK_Unified_Ideographs}\p{InCJK_Unified_Ideographs_Extension_A}\p{InCJK_Unified_Ideographs_Extension_B}\p{InKangxi_Radicals}\p{InHalfwidth_And_Fullwidth_Forms}\p{InIdeographic_Description_Characters}\s]*[\p{InHangul_Jamo}\p{InHangul_Compatibility_Jamo}\p{InHangul_Syllables}]) replacement=$1/ tokenizer class=solr.ICUTokenizerFactory/ filter class=solr.PatternReplaceFilterFactory pattern=^(\p{Punct}*)(.*?)(\p{Punct}*)$ replacement=$2 / filter class=solr.CJKWidthFilterFactory/ filter class=edu.stanford.lucene.analysis.CJKFoldingFilterFactory/ filter class=solr.ICUTransformFilterFactory id=Traditional-Simplified/ filter class=solr.ICUTransformFilterFactory id=Katakana-Hiragana/ filter class=solr.ASCIIFoldingFilterFactory/ filter class=solr.ICUNormalizer2FilterFactory/ filter class=solr.CJKBigramFilterFactory han=true hiragana=true katakana=true hangul=true outputUnigrams=true/ filter class=solr.LengthFilterFactory min=1 max=512/ /analyzer analyzer type=query !-- remove spaces among hangul and han caracters if there is at least one hangul character -- !-- a korean char guaranteed at the start of the pattern: pattern=(\p{Hangul}\p{Han}*)\s+(?=[\p{Hangul}\p{Han}]) -- charFilter class=solr.PatternReplaceCharFilterFactory
Re: SolrCloud OOM Problem
On 8/12/2014 3:12 PM, tuxedomoon wrote: I have modified my instances to m2.4xlarge 64-bit with 68.4G memory. Hate to ask this but can you recommend Java memory and GC settings for 90G data and the above memory? Currently I have CATALINA_OPTS=${CATALINA_OPTS} -XX:NewSize=1536m -XX:MaxNewSize=1536m -Xms5120m -Xmx5120m -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+UseConcMarkSweepGC Doesn't this mean I am starting with 5G and never going over 5G? Yes, that's exactly what it means -- you have a heap size limit of 5GB. The OutOfMemory error indicates that Solr needs more heap space than it is getting. You'll need to raise the -Xmx value. it is usually advisable to configure -Xms to match. The wiki page I linked before includes a link to the following page, listing the GC options that I use beyond the -Xmx setting: http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning Thanks, Shawn
Re: Access request
On 8/12/2014 3:29 PM, Vitaliy Zhovtiuk wrote: Please provide me access. User id vzhovtyuk My email vzhovt...@gmail.com Wiki user 'Vitaliy Zhovtyuk' Wiki username added to the Solr wiki contributors group.You didn't indicate exactly what kind of access you wanted, but that's the only kind of access that I am able to grant to end users. Thanks, Shawn
Re: ICUTokenizer acting very strangely with oriental characters
See the original message on this thread for full details. Some additional information: This happens on version 4.6.1, 4.7.2, and 4.9.0. Here is a screenshot showing the analysis problem in more detail. The first line you can see is the ICUTokenizer. https://www.dropbox.com/s/9wbi7lz77ivya9j/ICUTokenizer-wrong-analysis.png The original field value was: 20世紀の100人;ポートレートアーカイブス;政治家・軍人;政治家・指導 者・軍人;[政 治],100peopeof20century,pploftwentycentury,pploftwentycentury Thanks, Shawn
Solr query involving Street Addresses
I'm very new to Solr, and could use a point in the right direction on a task I've been assigned. I have a database containing customer information (phone number, email address, credit card, billing address, shipping address, etc.). I need to be able to take user-entered data, and use it to search through the database records to make decisions/take actions based on how closely the entered data matches the fields in the indexed documents. For the first 3 pieces of data I mentioned above, I will need to act on exact matches, but for the two address fields, I would like to be able to take actions based on how closely the entered data matches the information in the data collection (if the entered shipping address matches 85% of the shipping or billing addresses in any of the documents in the collection, do X, otherwise, do Y, as an example). The addresses contain street address, city, state, and zip code. Any direction or suggestions on this would be extremely appreciated - As I've said, I'm new to Solr, and could use any help that can be provided. Thanks, Guph -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-query-involving-Street-Addresses-tp4152617.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ICUTokenizer acting very strangely with oriental characters
mmn jnbbbjb)n9nooon Sent from my HTC - Reply message - From: Shawn Heisey s...@elyograg.org To: solr-user@lucene.apache.org solr-user@lucene.apache.org Subject: ICUTokenizer acting very strangely with oriental characters Date: Tue, Aug 12, 2014 19:00 See the original message on this thread for full details. Some additional information: This happens on version 4.6.1, 4.7.2, and 4.9.0. Here is a screenshot showing the analysis problem in more detail. The first line you can see is the ICUTokenizer. https://www.dropbox.com/s/9wbi7lz77ivya9j/ICUTokenizer-wrong-analysis.png The original field value was: 20世紀の100人;ポートレートアーカイブス;政治家・軍人;政治家・指導 者・軍人;[政 治],100peopeof20century,pploftwentycentury,pploftwentycentury Thanks, Shawn
Re: ICUTokenizer acting very strangely with oriental characters
Shawn, ICUTokenizer is operating as designed here. The key to understanding this is o.a.l.analysis.icu.segmentation.ScriptIterator.isSameScript(), called from ScriptIterator.next() with the scripts of two consecutive characters; these methods together find script boundaries. Here’s ScriptIterator.isSameScript(): /** Determine if two scripts are compatible. */ private static boolean isSameScript(int scriptOne, int scriptTwo) { return scriptOne = UScript.INHERITED || scriptTwo = UScript.INHERITED || scriptOne == scriptTwo; } ASCII digits are in the Unicode script named “Common” (see http://www.unicode.org/Public/6.3.0/ucd/Scripts.txt), and UScript.COMMON (0) is less than UScript.INHERITED (1) (see http://www.icu-project.org/~mow/ICU4JCodeCoverage/Current/com/ibm/icu/lang/UScript.html), so there will be no script boundary detected between a character from an oriental script followed by an ASCII digit, or vice versa - the ASCII digit will be assigned the same script as the preceding character. See UAX#24 for more info: http://www.unicode.org/reports/tr24/tr24-21.html (that’s the Unicode 6.3.0 version, which is supported by Lucene/Solr 4.9). Steve On Aug 12, 2014, at 7:00 PM, Shawn Heisey s...@elyograg.org wrote: See the original message on this thread for full details. Some additional information: This happens on version 4.6.1, 4.7.2, and 4.9.0. Here is a screenshot showing the analysis problem in more detail. The first line you can see is the ICUTokenizer. https://www.dropbox.com/s/9wbi7lz77ivya9j/ICUTokenizer-wrong-analysis.png The original field value was: 20世紀の100人;ポートレートアーカイブス;政治家・軍人;政治家・指導 者・軍人;[政 治],100peopeof20century,pploftwentycentury,pploftwentycentury Thanks, Shawn
Re: ICUTokenizer acting very strangely with oriental characters
On 8/12/2014 6:29 PM, Steve Rowe wrote: Shawn, ICUTokenizer is operating as designed here. The key to understanding this is o.a.l.analysis.icu.segmentation.ScriptIterator.isSameScript(), called from ScriptIterator.next() with the scripts of two consecutive characters; these methods together find script boundaries. Here’s ScriptIterator.isSameScript(): /** Determine if two scripts are compatible. */ private static boolean isSameScript(int scriptOne, int scriptTwo) { return scriptOne = UScript.INHERITED || scriptTwo = UScript.INHERITED || scriptOne == scriptTwo; } ASCII digits are in the Unicode script named “Common” (see http://www.unicode.org/Public/6.3.0/ucd/Scripts.txt), and UScript.COMMON (0) is less than UScript.INHERITED (1) (see http://www.icu-project.org/~mow/ICU4JCodeCoverage/Current/com/ibm/icu/lang/UScript.html), so there will be no script boundary detected between a character from an oriental script followed by an ASCII digit, or vice versa - the ASCII digit will be assigned the same script as the preceding character. See UAX#24 for more info: http://www.unicode.org/reports/tr24/tr24-21.html (that’s the Unicode 6.3.0 version, which is supported by Lucene/Solr 4.9). So the punctuation isn't considered break-worthy? This input: [政 治],100foo Becomes 政 治, 100, and foo. Thanks, Shawn
Re: ICUTokenizer acting very strangely with oriental characters
In the table below, the IsSameS (is same script) and SBreak? (script break = not IsSameS) decisions are based on what I mentioned in my previous message, and the WBreak (word break) decision is based on UAX#29 word break rules: CharCode Point ScriptIsSameS?SBreak? WBreak? ---- ----- --- 治U+6CBB Han Yes NoYes ] U+005DCommon Yes NoYes , U+002CCommon Yes NoYes 1 U+0031 Common -- -- -- First, script boundaries are found and used as token boundaries - in the above case, no script boundary is found between 治 and 1 - and then UAX#29 word break rules are used to find token boundaries inbetween script boundaries - in the above case, there are word boundaries between each character, but ICUTokenizer throws away punctuation-only sequences between token boundaries. Steve www.lucidworks.com On Tue, Aug 12, 2014 at 9:01 PM, Shawn Heisey s...@elyograg.org wrote: On 8/12/2014 6:29 PM, Steve Rowe wrote: Shawn, ICUTokenizer is operating as designed here. The key to understanding this is o.a.l.analysis.icu.segmentation.ScriptIterator.isSameScript(), called from ScriptIterator.next() with the scripts of two consecutive characters; these methods together find script boundaries. Here’s ScriptIterator.isSameScript(): /** Determine if two scripts are compatible. */ private static boolean isSameScript(int scriptOne, int scriptTwo) { return scriptOne = UScript.INHERITED || scriptTwo = UScript.INHERITED || scriptOne == scriptTwo; } ASCII digits are in the Unicode script named “Common” (see http://www.unicode.org/Public/6.3.0/ucd/Scripts.txt), and UScript.COMMON (0) is less than UScript.INHERITED (1) (see http://www.icu-project.org/~mow/ICU4JCodeCoverage/Current/com/ibm/icu/lang/UScript.html), so there will be no script boundary detected between a character from an oriental script followed by an ASCII digit, or vice versa - the ASCII digit will be assigned the same script as the preceding character. See UAX#24 for more info: http://www.unicode.org/reports/tr24/tr24-21.html (that’s the Unicode 6.3.0 version, which is supported by Lucene/Solr 4.9). So the punctuation isn't considered break-worthy? This input: [政 治],100foo Becomes 政 治, 100, and foo. Thanks, Shawn
Re: what's the difference between solr and elasticsearch in hdfs case?
Thanks Erick. I will try. -- View this message in context: http://lucene.472066.n3.nabble.com/what-s-the-difference-between-solr-and-elasticsearch-in-hdfs-case-tp4152413p4152626.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can I use multiple cores
And how many machines running the SOLR ? On 12 August 2014 22:12, Noble Paul noble.p...@gmail.com wrote: The machines were 32GB ram boxes. You must do the RAM requirement And how many machines running the SOLR ? I expect that I will have to add more servers. What I am looking for is how do I calculate how many servers I need.