Re: Storing solr results in excel
If we specify wt=csv then results appear like csv format but i need to store them in seperate excel file. -- View this message in context: http://lucene.472066.n3.nabble.com/Storing-solr-results-in-excel-tp4103237p4103247.html Sent from the Solr - User mailing list archive at Nabble.com.
How to use batchSize in DataImportHandler to throttle updates in a batch-mode
Hi All, I have a requirement to import a large amount of data from a mysql database and index documents (about 1000 documents). During indexing process I need to do a special processing of a field by sending a enhancement requests to an external Apache Stanbol server. I have configured my dataimport-handler in solrconfig.xml to use the StanbolContentProcessor in the update chain, as below; *updateRequestProcessorChain name=stanbolInterceptor* * processor class=com.solr.stanbol.processor.StanbolContentProcessorFactory/* *processor class=solr.RunUpdateProcessorFactory /* * /updateRequestProcessorChain* * requestHandler name=/dataimport class=solr.DataImportHandler * * lst name=defaults * * str name=configdata-config.xml/str* * str name=update.chainstanbolInterceptor/str* * /lst * * /requestHandler* My sample data-config.xml is as below; *dataConfig* *dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:3306/solrTest user=test password=test123 batchSize=1 /* *document name=stanboldata* *entity name=stanbolrequest query=SELECT * FROM documents* *field column=id name=id /* *field column=content name=content /* * field column=title name=title /* */entity* */document* */dataConfig* When running a large import with about 1000 documents, my stanbol server goes down, I suspect due to heavy load from the above Solr Stanbolnterceptor. I would like to throttle the dataimport in batches, so that Stanbol can process a manageable number of requests concurrently. Is this achievable using batchSize parameter in dataSource element in the data-config? Can someone please give some ideas to throttle the dataimport load in Solr? Thanks, Dileepa
CoreAdminRequest User Credentials 'Unauthorized'
Hello, I have protected my Solr-Admin URL with a required User Role (Security Constraint). Now, I am not able to reload my Core anymore, using CoreAdminRequest: CoreAdminRequest requestReload = new CoreAdminRequest(); requestReload.setAction(CoreAdminParams.CoreAdminAction.RELOAD); requestReload.process(getSolrServer()); My SolrServer knows the user and password, but in the HttpRequest this information is not used, so I get an 'unauthorized' response. Does anyone know how to make CoreAdminRequest use the UserCredentials? Best regards, Snubbel -- View this message in context: http://lucene.472066.n3.nabble.com/CoreAdminRequest-User-Credentials-Unauthorized-tp4103248.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Need help on Joining and sorting syntax and limitations between multiple documents in solr-4.4.0
Hi Team, Attaching the updated files as per the comments in the ticket. You can now try the VJOIN operation on the updated files. It would be also helpful for us if you send the correct VJOIN syntax with the inputs from the updated files. Thanks, Sukanta From: Sukanta Dey Sent: Friday, November 22, 2013 1:46 PM To: 'Colm Pruvot'; Yann Yu; 'Greg Harris' Cc: 'solr-user@lucene.apache.org'; Sukanta Dey Subject: RE: Need help on Joining and sorting syntax and limitations between multiple documents in solr-4.4.0 Hi Team, I am attaching all the required files we are using to get the VJOIN functionality along with the actual requirement statement. Hope this would help you understand better the requirement for VJOIN functionality. Thanks, Sukanta From: Sukanta Dey Sent: Wednesday, September 04, 2013 1:50 PM To: 'solr-user@lucene.apache.org' Cc: Sukanta Dey Subject: Need help on Joining and sorting syntax and limitations between multiple documents in solr-4.4.0 Hi Team, In my project I am going to use Apache solr-4.4.0 version for searching. While doing that I need to join between multiple solr documents within the same core on one of the common field across the documents. Though I successfully join the documents using solr-4.4.0 join syntax, it is returning me the expected result, but, since my next requirement is to sort the returned result on basis of the fields from the documents Involved in join condition's from clause, which I was not able to get. Let me explain the problem in detail along with the files I am using ... 1) Files being used : a. Picklist_1.xml -- adddoc field name=describedObjectIdt1324838/field field name=describedObjectType7/field field name=picklistItemId956/field field name=siteId130712901/field field name=enDraft/field field name=grDraoft/field /doc/add b. Picklist_2.xml --- adddoc field name=describedObjectIdt1324837/field field name=describedObjectType7/field field name=picklistItemId87749/field field name=siteId130712901/field field name=enNew/field field name=grNeuo/field /doc/add c. AssetID_1.xml --- adddoc field name=def14227_picklistt1324837/field field name=describedObjectIda180894808/field field name=describedObjectType1/field field name=isMetadataCompletetrue/field field name=lastUpdateDate2013-09-02T09:28:18Z/field field name=ownerId130713716/field field name=siteId130712901/field /doc/add d. AssetID_2.xml adddoc field name=def14227_picklistt1324838/field field name=describedObjectIda171658357/field field name=describedObjectType1/field field name=ownerId130713716/field field name=rGroupId2283961/field field name=rGroupId2290309/field field name=rGroupPermissionLevel7/field field name=rGroupPermissionLevel7/field field name=rRuleId13503796/field field name=rRuleId15485964/field field name=rUgpId38052/field field name=rUgpId41133/field field name=siteId130712901/field /doc/add 2) Requirement: i. It needs to have a join between the files using def14227_picklist field from AssetID_1.xml and AssetID_2.xml and describedObjectId field from Picklist_1.xml and Picklist_2.xml files. ii. After joining we need to have all the fields from the files AssetID_*.xml and en,gr fields from Picklist_*.xml files. iii. While joining we also sort the result based on the en field value. 3) I was trying with q={!join from=inner_id to=outer_id}zzz:vvv syntax but no luck. Any help/suggestion would be appreciated. Thanks, Sukanta Dey
Re: In a functon query, I can't get the ValueSource when extend ValueSourceParser
As your DateSourceParser is put in standardValueSourceParsers map with key dateDeboost (right?), I think you need specify name of your source like dateDeboost(title). 26.11.2013, 06:46, sling sling...@gmail.com: Thanks a lot for your reply, Chris. I was trying to sort the query result by the Datefunction, by passing q={!boost b=dateDeboost()}title:test to the /select request-handler. Before, my custom DateFunction is like this: public class DateFunction extends FieldCacheSource { private static final long serialVersionUID = 6752223682280098130L; private static long now; public DateFunction(String field) { super(field); now = System.currentTimeMillis(); } @Override public FunctionValues getValues(Map context, AtomicReaderContext readerContext) throws IOException { long[] times = cache.getLongs(readerContext.reader(), field, false); final float[] weights = new float[times.length]; for (int i = 0; i times.length; i++) { weights[i] = ScoreUtils.getNewsScoreFactor(now, times[i]); } return new FunctionValues() { @Override public float floatVal(int doc) { return weights[doc]; } }; } } It calculate every documet's date-weight, but at the same time , it only need the one doc's date-weight, so it run slowly. When I see the source code of recip function in org.apache.solr.search.ValueSourceParser, like this: addParser(recip, new ValueSourceParser() { @Override public ValueSource parse(FunctionQParser fp) throws SyntaxError { ValueSource source = fp.parseValueSource(); float m = fp.parseFloat(); float a = fp.parseFloat(); float b = fp.parseFloat(); return new ReciprocalFloatFunction(source, m, a, b); } }); and in the ReciprocalFloatFunction, it get the value like this: @Override public FunctionValues getValues(Map context, AtomicReaderContext readerContext) throws IOException { final FunctionValues vals = source.getValues(context, readerContext); return new FloatDocValues(this) { @Override public float floatVal(int doc) { return a/(m*vals.floatVal(doc) + b); } @Override public String toString(int doc) { return Float.toString(a) + /( + m + *float( + vals.toString(doc) + ')' + '+' + b + ')'; } }; } So I think this is what I want. When calculate a doc's date-weight, I needn't cache.getLongs(x), instead, I should source.getValues(xxx) Therefore I change my code, but when fp.parseValueSource(), it throws an error like this: org.apache.solr.search.SyntaxError: Expected identifier at pos 12 str='dateDeboost()' Do I describe clearly this time? Thanks again! sling -- View this message in context: http://lucene.472066.n3.nabble.com/In-a-functon-query-I-can-t-get-the-ValueSource-when-extend-ValueSourceParser-tp4103026p4103207.html Sent from the Solr - User mailing list archive at Nabble.com.
Is there any limit how many documents can be indexed by apache solr
Dear All I am using Apache solr 3.6.2 with Drupal 7. Users keeps adding their profiles (resumes) and with cron task from Drupal, documents get indexed. Recently I observed, after indexing around 11,000 documents, further documents are not getting indexed. Is there any configuration for max documents those can be indexed. Kindly help. Thanks kamal
Re: Is there any limit how many documents can be indexed by apache solr
Hi, In lucene you are supossed to be able to index up to 274 billion documents ( http://lucene.apache.org/core/3_0_3/fileformats.html#Limitations ), so in Solr should be something like that. Anyway the maximum number is quite bigger than those 11.000 ;) Could it be that you are reusing IDs so the new documents overwrite the old ones? 2013/11/26 Kamal Palei palei.ka...@gmail.com Dear All I am using Apache solr 3.6.2 with Drupal 7. Users keeps adding their profiles (resumes) and with cron task from Drupal, documents get indexed. Recently I observed, after indexing around 11,000 documents, further documents are not getting indexed. Is there any configuration for max documents those can be indexed. Kindly help. Thanks kamal -- Alejandro Marqués Rodríguez Paradigma Tecnológico http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42
Newline delimiter is not working in Solr with Velocity
Hi All, I am new to Solr. I am building small search application using solr. I am using velocity template to display the results. After index the data into solr, the data (search result) is not displaying newline delimiter (\n). when I check the indexed data using luke, found that '\n' available but while displaying the results it is not showing i.e., the results are not displaying with newline instead displaying data in the same line. I googled a lot, but no luck. Please help on this. Thanks in Advance !! -- View this message in context: http://lucene.472066.n3.nabble.com/Newline-delimiter-is-not-working-in-Solr-with-Velocity-tp4103271.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there any limit how many documents can be indexed by apache solr
Hello! Checkout also your application server logs. Maybe you're trying to index Documents with any syntax error and they are skipped. Regards, - Luis Cappa 2013/11/26 Alejandro Marqués Rodríguez amarq...@paradigmatecnologico.com Hi, In lucene you are supossed to be able to index up to 274 billion documents ( http://lucene.apache.org/core/3_0_3/fileformats.html#Limitations ), so in Solr should be something like that. Anyway the maximum number is quite bigger than those 11.000 ;) Could it be that you are reusing IDs so the new documents overwrite the old ones? 2013/11/26 Kamal Palei palei.ka...@gmail.com Dear All I am using Apache solr 3.6.2 with Drupal 7. Users keeps adding their profiles (resumes) and with cron task from Drupal, documents get indexed. Recently I observed, after indexing around 11,000 documents, further documents are not getting indexed. Is there any configuration for max documents those can be indexed. Kindly help. Thanks kamal -- Alejandro Marqués Rodríguez Paradigma Tecnológico http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 -- - Luis Cappa
Re: How to retain the original format of input document in search results in SOLR - Tomcat
Thanks Erick for your reply.. I am using velocity template for display the result. #field('SDtext') == here SDtext is my field. here is my field definition in schema.xml field name=Resolution type=text_en indexed=true stored=true / field name=SDtext type=string indexed=false stored=true multiValued=false/ copyField source=Resolution dest=SDtext/ fieldType name=text_en class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index charFilter class=solr.PatternReplaceCharFilterFactory pattern=\n replacement=lt; br gt; / tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_en.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ filter class=solr.TrimFilterFactory / /analyzer /fieldType bPlease guide what am I doing wrong ??* Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-retain-the-original-format-of-input-document-in-search-results-in-SOLR-Tomcat-tp4102327p4103276.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Storing solr results in excel
My Response is coming in the follwing way response: { numFound: 21, start: 0, maxScore: 1, docs: [ { pageType: LP, category: some category name, url: some url, score: 1 } ] } I need to store results in antohter string variables. Anybody help me how can i do it using solrj -- View this message in context: http://lucene.472066.n3.nabble.com/Storing-solr-results-in-excel-tp4103237p4103280.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there any limit how many documents can be indexed by apache solr
Thanks Alejandro and Luis. If I need to see logs, how can I see it. Is it stored in any default log files. I am using below command to start apache solr. java -Xms64m -Xmx6g -jar start.jar I am using it along with Drupal 7.1.5 , I am trying to find out if it is a Drupal issue or Apache solr issue. Not getting clue where to start In terminal where I have started apache solr, I get below logs when I attempt to index for remaining documents. [root@example]# [root@example]# [root@example]# Nov 26, 2013 5:28:56 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/admin/ping params={} hits=0 status=0 QTime=1 Nov 26, 2013 5:28:56 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/admin/ping params={} status=0 QTime=1 Nov 26, 2013 5:28:56 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/admin/ping params={} hits=0 status=0 QTime=1 Nov 26, 2013 5:28:56 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/admin/ping params={} status=0 QTime=1 [root@example]# In addition to this, is there any other log folder... Kindly bear with me,as I am very novice in apache solr. Regards kamal On Tue, Nov 26, 2013 at 5:19 PM, Luis Cappa Banda luisca...@gmail.comwrote: Hello! Checkout also your application server logs. Maybe you're trying to index Documents with any syntax error and they are skipped. Regards, - Luis Cappa 2013/11/26 Alejandro Marqués Rodríguez amarq...@paradigmatecnologico.com Hi, In lucene you are supossed to be able to index up to 274 billion documents ( http://lucene.apache.org/core/3_0_3/fileformats.html#Limitations ), so in Solr should be something like that. Anyway the maximum number is quite bigger than those 11.000 ;) Could it be that you are reusing IDs so the new documents overwrite the old ones? 2013/11/26 Kamal Palei palei.ka...@gmail.com Dear All I am using Apache solr 3.6.2 with Drupal 7. Users keeps adding their profiles (resumes) and with cron task from Drupal, documents get indexed. Recently I observed, after indexing around 11,000 documents, further documents are not getting indexed. Is there any configuration for max documents those can be indexed. Kindly help. Thanks kamal -- Alejandro Marqués Rodríguez Paradigma Tecnológico http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 -- - Luis Cappa
Re: Storing solr results in excel
Excel can read/import CSV files, so simply write out the Solr results in a CSV text file. If you need/wish to write out a native Excel XLSM file, that is beyond the scope of Solr and SolrJ and this mailing list. Try a Google search on that topic. I mean, usually Solr results are directly consumed by an application. -- Jack Krupansky -Original Message- From: kumar Sent: Tuesday, November 26, 2013 3:31 AM To: solr-user@lucene.apache.org Subject: Re: Storing solr results in excel If we specify wt=csv then results appear like csv format but i need to store them in seperate excel file. -- View this message in context: http://lucene.472066.n3.nabble.com/Storing-solr-results-in-excel-tp4103237p4103247.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there any limit how many documents can be indexed by apache solr
Hi All I tried to get the log from terminal. The complete log I have put at the end of this email. One place in log I see, it is logged as Nov 26, 2013 5:46:24 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/update params={wt=json} status=0 QTime=1 Nov 26, 2013 5:46:25 AM org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener buildSpellIndex INFO: Building spell index for spell checker: jarowinkler Nov 26, 2013 5:46:26 AM org.apache.solr.core.SolrCore registerSearcher INFO: [] Registered new searcher Searcher@119a0c4e main Nov 26, 2013 5:46:26 AM org.apache.solr.search.SolrIndexSearcher close INFO: Closing Searcher@35d22ddb main Though it is not error or something when indexing got stopped this is the log that was showing in my terminal. The complete log is as below. Can somebody help me how can I index more number of documents, currently stopping after indexing around 11,000 documents. ov 26, 2013 5:46:16 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[8lftl7/node/5773, 8lftl7/node/5774, 8lftl7/node/5775, 8lftl7/node/5776, 8lftl7/node/5777, 8lftl7/node/5778, 8lftl7/node/5779, 8lftl7/node/5780, ... (20 adds)]} 0 3 Nov 26, 2013 5:46:16 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/update params={wt=json} status=0 QTime=3 Nov 26, 2013 5:46:16 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[8lftl7/node/5793, 8lftl7/node/5794, 8lftl7/node/5795, 8lftl7/node/5796, 8lftl7/node/5797, 8lftl7/node/5798, 8lftl7/node/5799, 8lftl7/node/5800, ... (10 adds)]} 0 2 Nov 26, 2013 5:46:16 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/update params={wt=json} status=0 QTime=2 Nov 26, 2013 5:46:16 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/admin/ping params={} hits=0 status=0 QTime=1 Nov 26, 2013 5:46:16 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/admin/ping params={} status=0 QTime=1 Nov 26, 2013 5:46:16 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[8lftl7/file/5750-node-6276, 8lftl7/file/5751-node-6277, 8lftl7/file/5752-node-6278, 8lftl7/file/5753-node-6279, 8lftl7/file/5754-node-6280, 8lftl7/file/5755-node-6281, 8lftl7/file/5756-node-6282, 8lftl7/file/5757-node-6283, ... (20 adds)]} 0 25 Nov 26, 2013 5:46:16 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/update params={wt=json} status=0 QTime=25 Nov 26, 2013 5:46:16 AM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true,expungeDeletes=false) Nov 26, 2013 5:46:23 AM org.apache.solr.core.SolrDeletionPolicy onCommit INFO: SolrDeletionPolicy.onCommit: commits:num=2 commit{dir=/usr/local/src/apachesolr/apache-solr-3.6.2/example/solr/data/index,segFN=segments_1o4,version=1362286025701,generation=2164,filenames=[_20f.prx, _20f.tii, _20h.tvx, _20f.fnm, _20g.tii, _20h.frq, _20g.nrm, _20h.fdt, _20f.tvd, _20g.tis, _20f.frq, _20f.tvf, _20h.fdx, _20h.fnm, _20g.tvx, _20h.tvf, _20h.tis, _20f.tis, segments_1o4, _20g.fdx, _20g.fnm, _20h.tii, _20g.tvf, _20g.fdt, _20f_1.del, _20g.tvd, _20f.fdt, _20f.fdx, _20g.frq, _20g.prx, _20f.tvx, _20f.nrm, _20h.tvd, _20h.prx, _20h.nrm] commit{dir=/usr/local/src/apachesolr/apache-solr-3.6.2/example/solr/data/index,segFN=segments_1o5,version=1362286025705,generation=2165,filenames=[_20j.prx, _20j.tii, _20j.fdt, _20j.nrm, _20j.frq, _20j.fdx, segments_1o5, _20j.tvf, _20j.tvd, _20j.fnm, _20j.tvx, _20j.tis] Nov 26, 2013 5:46:23 AM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1362286025705 Nov 26, 2013 5:46:23 AM org.apache.solr.search.SolrIndexSearcher init INFO: Opening Searcher@119a0c4e main Nov 26, 2013 5:46:23 AM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush Nov 26, 2013 5:46:23 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming Searcher@119a0c4e main from Searcher@35d22ddb main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Nov 26, 2013 5:46:23 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for Searcher@119a0c4e main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Nov 26, 2013 5:46:23 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming Searcher@119a0c4e main from Searcher@35d22ddb main filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Nov 26, 2013 5:46:23 AM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for Searcher@119a0c4e main
Re: Trouble with manually routed collection after upgrade to 4.6
That's not good. I'll investigate. On Mon, Nov 25, 2013 at 10:29 PM, Brett Hoerner br...@bretthoerner.com wrote: Think I got it. For some reason this was in my clusterstate.json after the upgrade (note that I was using 4.5.X just fine previously...): router: { name: compositeId }, I stopped all my nodes and manually edited this to me implicit (is there a tool for this? I've always done it manually), started the cluster up again and it's all good now. On Mon, Nov 25, 2013 at 10:38 AM, Brett Hoerner br...@bretthoerner.comwrote: Here's my clusterstate.json: https://gist.github.com/bretthoerner/a8120a8d89c93f773d70 On Mon, Nov 25, 2013 at 10:18 AM, Brett Hoerner br...@bretthoerner.comwrote: Hi, I've been using a collection on Solr 4.5.X for a few weeks and just did an upgrade to 4.6 and am having some issues. First: this collection is, I guess, implicitly routed. I do this for every document insert using SolrJ: document.addField(_route_, shardId) After upgrading the servers to 4.6 I now get the following on every insert/delete when using either SolrJ 4.5.1 or 4.6: org.apache.solr.common.SolrException: No active slice servicing hash code 17b9dff6 in DocCollection In the clusterstate *none* of my shards have a range set (they're all null), but I thought this would be expected since I do routing myself. Did the upgrade change something here? I didn't see anything related to this in the upgrade notes. Thanks, Brett -- Regards, Shalin Shekhar Mangar.
Re: HttpSolrServer - Http Client Connection pooling issue
On 11/25/2013 11:04 PM, imgaur...@yahoo.in wrote: We are using Solr 4.3.1 and we are using HttpSolrServer for querying Solr. We are trying to do a load and stress test using Jmeter and we can see that after certain requests Solr responds in very unusual way. It gets stuck and responds only after sometime. Upon checking the Http Connections we realized that there are so many open connections that are not closed. My questions are: 1. Is there a way to do HTTP connection pooling ? As of Solr 4.2, SolrJ uses SystemDefaultHttpClient for its HttpClient, which by default uses PoolingClientConnectionManager. It should already be using connection pooling. Earlier versions (with DefaultHttpClient) probably used a similar connection manager. There is a bug in SolrJ related to leaking HttpClient connections. That bug is fixed in 4.5.1, so the fix is also in the just-released 4.6.0. https://issues.apache.org/jira/browse/SOLR-4327 Note that HttpSolrServer instance is static. 2. Can I configure Http Connections using solrconfig file ? The solrconfig.xml file configures the Solr *server*. Although some server settings do affect client connections, most of the configuration for the client is done in the client. You can create your own HttpClient for SolrJ and configure it however you like. If you are using SolrCloud or distributed search, then there is a SolrJ client within Solr itself that can be configured with HttpShardHandler configuration in solrconfig.xml or solr.xml. Thanks, Shawn
SOLR-4237 -- internal SolrJ client?
I am looking at SOLR-4237: https://issues.apache.org/jira/browse/SOLR-4327
Re: SOLR-4237 -- internal SolrJ client?
On 11/26/2013 7:52 AM, Shawn Heisey wrote: I am looking at SOLR-4237: https://issues.apache.org/jira/browse/SOLR-4327 My email client got over-excited and sent that before I was ready. What I want to know is whether this bug affects the internal SolrJ client used for distributed search. All my requests are distributed. Thanks, Shawn
Request to join the ContributorsGroup
To whom it may concern, I am currently working on Solr and I would like to contribute a bit on the wiki pages. So, I would like to get access to the Solr Wiki pages. Thanks, Anastasios -- Anastasios Zouzias IBM Research Division - Zurich Research Laboratory Saumerstrasse 4 8803 Ruschlikon - Switzerland a...@zurich.ibm.com
How to request not directly my SOLR server ?
Dear All, I show my SOLR server to a friend and its first question was: You can request directly your solr database from your internet explorer?! is it not a security problem? each person which has your request link can use your database directly? So I ask the question here. I protect my admin panel but is it possible to protect a direct request ? By using google, lot a result concern admin panel security but I can't find information about that. Thanks for your comment, Bruno --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com
Re: Request to join the ContributorsGroup
Sure, and thanks! First, create a login on the Wiki. Then let us know what that name is and we'll add you to the ACL. Unfortunately we had a problem a while ago with spam bots creating bogus pages and had to lock it down so this step became necessary. Best, Erick On Tue, Nov 26, 2013 at 10:15 AM, Anastasios Zouzias zouz...@gmail.comwrote: To whom it may concern, I am currently working on Solr and I would like to contribute a bit on the wiki pages. So, I would like to get access to the Solr Wiki pages. Thanks, Anastasios -- Anastasios Zouzias IBM Research Division - Zurich Research Laboratory Saumerstrasse 4 8803 Ruschlikon - Switzerland a...@zurich.ibm.com
Re: a function query of time, frequency and score.
You can combine a whole series of mathematical functions that take into account the values of fields and use that for scoring. Functions can take other functions as one of their operands, so you could make something arbitrarily complex that takes into account several of your fields. If this still doesn't apply, please provide some examples of what your docs look like and what you want to do with them. Best, Erick On Mon, Nov 25, 2013 at 11:45 PM, sling sling...@gmail.com wrote: Thanks, Erick. What I want to do is custom the sort by date, time, and number. I want to know is there some formula to tackle this. Thanks again! sling On Fri, Nov 22, 2013 at 9:11 PM, Erick Erickson [via Lucene] ml-node+s472066n4102599...@n3.nabble.com wrote: Not quite sure what you're asking. The field() function query brings the value of a field into the score, something like: http://localhost:8983/solr/select?wt=jsonfl=id%20scoreq={!boost%20b=field(popularity)}ipod Best, Erick On Thu, Nov 21, 2013 at 10:43 PM, sling [hidden email] http://user/SendEmail.jtp?type=nodenode=4102599i=0 wrote: Hi, guys. I indexed 1000 documents, which have fields like title, ptime and frequency. The title is a text fild, the ptime is a date field, and the frequency is a int field. Frequency field is ups and downs. say sometimes its value is 0, and sometimes its value is 999. Now, in my app, the query could work with function query well. The function query is implemented as the score multiplied by an decreased date-weight array. However, I have got no idea to add the frequency to this formula... so could someone give me a clue? Thanks again! sling -- View this message in context: http://lucene.472066.n3.nabble.com/a-function-query-of-time-frequency-and-score-tp4102531.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/a-function-query-of-time-frequency-and-score-tp4102531p4102599.html To unsubscribe from a function query of time, frequency and score., click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4102531code=c2xpbmczNThAZ21haWwuY29tfDQxMDI1MzF8NzMyOTA2Njg2 . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/a-function-query-of-time-frequency-and-score-tp4102531p4103216.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Setting solr.data.dir for SolrCloud instance
The data _is_ separated from the code. It's all relative to solr_home which need not have any relation to where the code is executing from. For instance, I can start Solr like java -Dsolr.solr.home=/Users/Erick/testdir/solr -jar start.jar and have my war in a completely different place. Best, Erick On Tue, Nov 26, 2013 at 1:08 AM, adfel70 adfe...@gmail.com wrote: Thanks for the reply, Erick. Actually, I didnt not think this through. I just thought it would be a good idea to separate the data from the application code. I guess I'll leave it without setting the datadir parameter and add a symlink. -- View this message in context: http://lucene.472066.n3.nabble.com/Setting-solr-data-dir-for-SolrCloud-instance-tp4103052p4103228.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Request to join the ContributorsGroup
Hi Erick, Thanks for the prompt reply. My username on the wiki is zouzias. Best, Anastasios On Tue, Nov 26, 2013 at 4:58 PM, Erick Erickson erickerick...@gmail.comwrote: Sure, and thanks! First, create a login on the Wiki. Then let us know what that name is and we'll add you to the ACL. Unfortunately we had a problem a while ago with spam bots creating bogus pages and had to lock it down so this step became necessary. Best, Erick On Tue, Nov 26, 2013 at 10:15 AM, Anastasios Zouzias zouz...@gmail.comwrote: To whom it may concern, I am currently working on Solr and I would like to contribute a bit on the wiki pages. So, I would like to get access to the Solr Wiki pages. Thanks, Anastasios -- Anastasios Zouzias IBM Research Division - Zurich Research Laboratory Saumerstrasse 4 8803 Ruschlikon - Switzerland a...@zurich.ibm.com
Re: Revolution writeup
Hi Mike, Thanks a lot for sharing. I posted my impressions on the conference as well, right after it has finished. So I'll share them here, if you don't mind: Day 1: http://dmitrykan.blogspot.fi/2013/11/lucene-revolution-eu-2013-in-dublin-day.html Day 2: http://dmitrykan.blogspot.fi/2013/11/lucene-revolution-eu-2013-in-dublin-day_13.html Dmitry On Mon, Nov 25, 2013 at 8:42 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: I just posted a writeup of the Lucene/Solr Revolution Dublin conference. I've been waiting for videos to become available, but I got impatient. Slides are there, mostly though. Sorry if I missed your talk -- I'm hoping to catch up when the videos are posted... http://blog.safariflow.com/2013/11/25/this-revolution-will-be-televised/ -Mike Sokolov -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan
Re: Setting solr.data.dir for SolrCloud instance
The problem we had was that we tried to run: java -Dsolr.data.dir=/opt/solr/data -Dsolr.solr.home=/opt/solr/home -jar start.jar and got different behavior for how solr handles these 2 params. we created 2 collections, which created 2 cores. then we got 2 home dirs for the cores, as expected: /opt/solr/home/collection1_shard1_replica1 /opt/solr/home/collection2_shard1_replica1 but instead of creating 2 data dirs like: /opt/solr/data/collection1_shard1_replica1 /opt/solr/data/collection2_shard1_replica1 solr had both cores' data dirs pointing to the same directory - /opt/solr/data when we tried putting a relative path in -Dsolr.data.dir, it worked as expected. I don't know if this is a bug, but we thought of 2 solutions in our case: 1. point -Dsolr.data.dir to a relative path on symlink that path to the absolute path we wanted in the first place. 2. dont provide -Dsolr.data.dir at all, and then solr puts the data dir inside the home.dir, which as said, works with relative paths. we chose the first option for now. Erick Erickson wrote The data _is_ separated from the code. It's all relative to solr_home which need not have any relation to where the code is executing from. For instance, I can start Solr like java -Dsolr.solr.home=/Users/Erick/testdir/solr -jar start.jar and have my war in a completely different place. Best, Erick On Tue, Nov 26, 2013 at 1:08 AM, adfel70 lt; adfel70@ gt; wrote: Thanks for the reply, Erick. Actually, I didnt not think this through. I just thought it would be a good idea to separate the data from the application code. I guess I'll leave it without setting the datadir parameter and add a symlink. -- View this message in context: http://lucene.472066.n3.nabble.com/Setting-solr-data-dir-for-SolrCloud-instance-tp4103052p4103228.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/Setting-solr-data-dir-for-SolrCloud-instance-tp4103052p4103334.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Setting solr.data.dir for SolrCloud instance
On Nov 25, 2013, at 8:12 AM, adfel70 adfe...@gmail.com wrote: I was expecting that the path I sent would serve as the BASE path for all cores the the node hosts When running Solr on HDFS, there is a similar prop you can use -Dsolr.hdfs.home. If you set that, all data dirs are created nicely under it. We talked about wanting a similar option for SolrCloud and local filesystem a while back. If there is no JIRA issue for it, please file one! - Mark
Multivalued true Error?
Hi; I've ported this example from Scala into Java: http://sujitpal.blogspot.com/2013/07/porting-payloads-to-solr4.html#! However does field should be multivalued true at that example? PS: I use Solr 4.5.1 Thanks; Furkan KAMACI
Re: Revolution writeup
On 26/11/2013 16:19, Dmitry Kan wrote: Hi Mike, Thanks a lot for sharing. I posted my impressions on the conference as well, right after it has finished. So I'll share them here, if you don't mind: Day 1: http://dmitrykan.blogspot.fi/2013/11/lucene-revolution-eu-2013-in-dublin-day.html Day 2: http://dmitrykan.blogspot.fi/2013/11/lucene-revolution-eu-2013-in-dublin-day_13.html Me too! http://www.flax.co.uk/blog/2013/11/06/lucene-revolution-2013-dublin-day-1/ http://www.flax.co.uk/blog/2013/11/08/lucene-revolution-2013-dublin-day-2/ Four of the Flax team were there and we had a great time. Charlie Dmitry On Mon, Nov 25, 2013 at 8:42 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: I just posted a writeup of the Lucene/Solr Revolution Dublin conference. I've been waiting for videos to become available, but I got impatient. Slides are there, mostly though. Sorry if I missed your talk -- I'm hoping to catch up when the videos are posted... http://blog.safariflow.com/2013/11/25/this-revolution-will-be-televised/ -Mike Sokolov -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
Solr 3.6.1 stalling with high CPU and blocking on field cache
I've been tracking a problem in our Solr environment for awhile with periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I might get some insight from some others on this list. The load on the server is normally anywhere between 1-3. It's an 8-core machine with 40GB of RAM. I have about 25GB of index data that is replicated to this server every 5 minutes. It's taking about 200 connections per second and roughly every 5-10 minutes it will stall for about 30 seconds to a minute. The stall causes the load to go to as high as 90. It is all CPU bound in user space - all cores go to 99% utilization (spinlock?). When doing a thread dump, the following line is blocked in all running Tomcat threads: org.apache.lucene.search.FieldCacheImpl$Cache.get ( FieldCacheImpl.java:230 ) Looking the source code in 3.6.1, that is a function call to syncronized() which blocks all threads and causes the backlog. I've tried to correlate these events to the replication events - but even with replication disabled - this still happens. We run multiple data centers using Solr and I was comparing garbage collection processes between and noted that the old generation is collected very differently on this data center versus others. The old generation is collected as a massive collect event (several gigabytes worth) - the other data center is more saw toothed and collects only in 500MB-1GB at a time. Here's my parameters to java (the same in all environments): /usr/java/jre/bin/java \ -verbose:gc \ -XX:+PrintGCDetails \ -server \ -Dcom.sun.management.jmxremote \ -XX:+UseConcMarkSweepGC \ -XX:+UseParNewGC \ -XX:+CMSIncrementalMode \ -XX:+CMSParallelRemarkEnabled \ -XX:+CMSIncrementalPacing \ -XX:NewRatio=3 \ -Xms30720M \ -Xmx30720M \ -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ -classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \ -Dcatalina.base=/usr/local/share/apache-tomcat \ -Dcatalina.home=/usr/local/share/apache-tomcat \ -Djava.io.tmpdir=/tmp \ org.apache.catalina.startup.Bootstrap start I've tried a few GC option changes from this (been running this way for a couple of years now) - primarily removing CMS Incremental mode as we have 8 cores and remarks on the internet suggest that it is only for smaller SMP setups. Removing CMS did not fix anything. I've considered that the heap is way too large (30GB from 40GB) and may not leave enough memory for mmap operations (MMap appears to be used in the field cache). Based on active memory utilization in Java, seems like I might be able to reduce down to 22GB safely - but I'm not sure if that will help with the CPU issues. I think field cache is used for sorting and faceting. I've started to investigate facet.method, but from what I can tell, this doesn't seem to influence sorting at all - only facet queries. I've tried setting useFilterForSortQuery, and seems to require less field cache but doesn't address the stalling issues. Is there something I am overlooking? Perhaps the system is becoming oversubscribed in terms of resources? Thanks for any help that is offered. -- Patrick O'Lone Director of Software Development TownNews.com E-mail ... pol...@townnews.com Phone 309-743-0809 Fax .. 309-743-0830
Re: How to request not directly my SOLR server ?
On 11/26/2013 8:37 AM, Bruno Mannina wrote: I show my SOLR server to a friend and its first question was: You can request directly your solr database from your internet explorer?! is it not a security problem? each person which has your request link can use your database directly? So I ask the question here. I protect my admin panel but is it possible to protect a direct request ? Don't make your Solr server directly accessible from the Internet. Only make it accessible from the machines that serve your website and whoever needs to administer it. Solr has no security features. You can use the security features in whatever container is running Solr, but that is outside the scope of this mailing list. Thanks, Shawn
Re: Setting solr.data.dir for SolrCloud instance
On 11/26/2013 9:19 AM, adfel70 wrote: The problem we had was that we tried to run: java -Dsolr.data.dir=/opt/solr/data -Dsolr.solr.home=/opt/solr/home -jar start.jar and got different behavior for how solr handles these 2 params. we created 2 collections, which created 2 cores. then we got 2 home dirs for the cores, as expected: /opt/solr/home/collection1_shard1_replica1 /opt/solr/home/collection2_shard1_replica1 but instead of creating 2 data dirs like: /opt/solr/data/collection1_shard1_replica1 /opt/solr/data/collection2_shard1_replica1 solr had both cores' data dirs pointing to the same directory - /opt/solr/data when we tried putting a relative path in -Dsolr.data.dir, it worked as expected. I don't know if this is a bug, but we thought of 2 solutions in our case: 1. point -Dsolr.data.dir to a relative path on symlink that path to the absolute path we wanted in the first place. 2. dont provide -Dsolr.data.dir at all, and then solr puts the data dir inside the home.dir, which as said, works with relative paths. we chose the first option for now. The dataDir is a per-core setting, you cannot set it for the entire application. If you make it relative, then it will be relative to each individual instanceDir. It defaults to ./data, so you get $instanceDir/data as the location. Thanks, Shawn
Client-side proxy for Solr 4.5.0
Are there any GOOD client-side solutions to proxy a Solr 4.5.0 instance so that the end-user can see their queries w/o being able to directly access :8983? Applications/frameworks used: - Solr 4.5.0 - AJAX Solr (javascript library) Thank you, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Re: Multivalued true Error?
Hi Furkan, In the stock definition of the payload field: http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/collection1/conf/schema.xml?view=markup the analyzer for payloads field type is a WhitespaceTokenizerFactory followed by a DelimitedPayloadTokenFilterFactory. So if you send it a string foo$score1 bar$score2 ... where foo and bar are string tokens and score[12] are payload scores and $ is your delimiter, the analyzer will tokenize it into multiple payloads and you should be able to run the tests in the blog post. So you shouldn't make it multiValued AFAIK. -sujit On Tue, Nov 26, 2013 at 8:44 AM, Furkan KAMACI furkankam...@gmail.comwrote: Hi; I've ported this example from Scala into Java: http://sujitpal.blogspot.com/2013/07/porting-payloads-to-solr4.html#! However does field should be multivalued true at that example? PS: I use Solr 4.5.1 Thanks; Furkan KAMACI
RE: Solr 3.6.1 stalling with high CPU and blocking on field cache
I am not completely sure about that, but if I remember correctly (it has been more than one year since I've did that and I was lazy enogh not to take notes :( ), it helped that I've reduced the percentage of size of permanent generation (somehow, more GC on less permanent gen, but this one is not blocking the system and it could be that it prevents really large GC's - at the account of more smaller ones). But it is far from sound advice, it is just somehow distant memory and I've could also mixed things up in my memory (been doing many other things in between), so my advice could as well be misleading (and make sure that your heap is still big enough, once you get bellow reasonable value, nothing will help). P.S. if it worked for you, just let us know. Regards Patrice Monroe Pustavrh, Software developer, Bisnode Slovenia d.o.o. -Original Message- From: Patrick O'Lone [mailto:pol...@townnews.com] Sent: Tuesday, November 26, 2013 5:59 PM To: solr-user@lucene.apache.org Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache I've been tracking a problem in our Solr environment for awhile with periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I might get some insight from some others on this list. The load on the server is normally anywhere between 1-3. It's an 8-core machine with 40GB of RAM. I have about 25GB of index data that is replicated to this server every 5 minutes. It's taking about 200 connections per second and roughly every 5-10 minutes it will stall for about 30 seconds to a minute. The stall causes the load to go to as high as 90. It is all CPU bound in user space - all cores go to 99% utilization (spinlock?). When doing a thread dump, the following line is blocked in all running Tomcat threads: org.apache.lucene.search.FieldCacheImpl$Cache.get ( FieldCacheImpl.java:230 ) Looking the source code in 3.6.1, that is a function call to syncronized() which blocks all threads and causes the backlog. I've tried to correlate these events to the replication events - but even with replication disabled - this still happens. We run multiple data centers using Solr and I was comparing garbage collection processes between and noted that the old generation is collected very differently on this data center versus others. The old generation is collected as a massive collect event (several gigabytes worth) - the other data center is more saw toothed and collects only in 500MB-1GB at a time. Here's my parameters to java (the same in all environments): /usr/java/jre/bin/java \ -verbose:gc \ -XX:+PrintGCDetails \ -server \ -Dcom.sun.management.jmxremote \ -XX:+UseConcMarkSweepGC \ -XX:+UseParNewGC \ -XX:+CMSIncrementalMode \ -XX:+CMSParallelRemarkEnabled \ -XX:+CMSIncrementalPacing \ -XX:NewRatio=3 \ -Xms30720M \ -Xmx30720M \ -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ -classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \ -Dcatalina.base=/usr/local/share/apache-tomcat \ -Dcatalina.home=/usr/local/share/apache-tomcat \ -Djava.io.tmpdir=/tmp \ org.apache.catalina.startup.Bootstrap start I've tried a few GC option changes from this (been running this way for a couple of years now) - primarily removing CMS Incremental mode as we have 8 cores and remarks on the internet suggest that it is only for smaller SMP setups. Removing CMS did not fix anything. I've considered that the heap is way too large (30GB from 40GB) and may not leave enough memory for mmap operations (MMap appears to be used in the field cache). Based on active memory utilization in Java, seems like I might be able to reduce down to 22GB safely - but I'm not sure if that will help with the CPU issues. I think field cache is used for sorting and faceting. I've started to investigate facet.method, but from what I can tell, this doesn't seem to influence sorting at all - only facet queries. I've tried setting useFilterForSortQuery, and seems to require less field cache but doesn't address the stalling issues. Is there something I am overlooking? Perhaps the system is becoming oversubscribed in terms of resources? Thanks for any help that is offered. -- Patrick O'Lone Director of Software Development TownNews.com E-mail ... pol...@townnews.com Phone 309-743-0809 Fax .. 309-743-0830
SOLR Master-Slave Repeater with Load balancer
We are trying to setup solr Master Slave repeater model, where we will have two solr servers say S1 and S2 and a Load balancer LB to route all the requests to either S1 or S2. S1 and S2 acts as both Master and Slave(Repeater). In both the solr server configurations, in the solrconfig.xml file for master url property if we provide Load balancer host-name and port number then at any point there will be a self polling, i.e. if LB is configured in such a way that all its requests will be routed to S1, then while polling S1--LB--S1 and S2--LB--S1. Do you see any issue with self polling(S1--LB--). We are mainly trying to achieve High availability as we don't want to use Solr Cloud. Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Master-Slave-Repeater-with-Load-balancer-tp4103363.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Client-side proxy for Solr 4.5.0
I don't think you mean client-side proxy. You need a server side layer such as a normal web application or good proxy. We use Nginx, it is very fast and very feature rich. Its config scripting is usually enough to restrict access and limit input parameters. We also use Nginx's embedded Perl and Lua scripting besides its config scripting to implement more difficult logic. -Original message- From:Reyes, Mark mark.re...@bpiedu.com Sent: Tuesday 26th November 2013 19:27 To: solr-user@lucene.apache.org Subject: Client-side proxy for Solr 4.5.0 Are there any GOOD client-side solutions to proxy a Solr 4.5.0 instance so that the end-user can see their queries w/o being able to directly access :8983? Applications/frameworks used: - Solr 4.5.0 - AJAX Solr (javascript library) Thank you, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
TermVectorComponent NullPointerException
Hi, I am working on using term vector component with solr 4.2.1. If I use solr in a multicore environment, then I am getting a Null Pointer exception. However, if I use single core as is mentioned at :- http://wiki.apache.org/solr/TermVectorComponent then I do not get any exception. However, the response that I get does not contain any term information. So, did anybody else also faced this issue ? With Regards, Ankur
RE: Solr 3.6.1 stalling with high CPU and blocking on field cache
I am not completely sure about that, but if I remember correctly (it has been more than one year since I've did that and I was hmm.. whatever you want to write here, enogh not to take notes :( ), it helped that I've reduced the percentage of size of permanent generation (somehow, more GC on less permanent gen, but this one is not blocking the system and it could be that it prevents really large GC's - at the account of more smaller ones). But it is far from sound advice, it is just somehow distant memory and I've could also mixed things up in my memory (been doing many other things in between), so my advice could as well be misleading (and make sure that your heap is still big enough, once you get bellow reasonable value, nothing will help). P.S. if it worked for you, just let us know. Regards Patrice Monroe Pustavrh, Software developer, Bisnode Slovenia d.o.o. -Original Message- From: Patrick O'Lone [mailto:pol...@townnews.com] Sent: Tuesday, November 26, 2013 5:59 PM To: solr-user@lucene.apache.org Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache I've been tracking a problem in our Solr environment for awhile with periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I might get some insight from some others on this list. The load on the server is normally anywhere between 1-3. It's an 8-core machine with 40GB of RAM. I have about 25GB of index data that is replicated to this server every 5 minutes. It's taking about 200 connections per second and roughly every 5-10 minutes it will stall for about 30 seconds to a minute. The stall causes the load to go to as high as 90. It is all CPU bound in user space - all cores go to 99% utilization (spinlock?). When doing a thread dump, the following line is blocked in all running Tomcat threads: org.apache.lucene.search.FieldCacheImpl$Cache.get ( FieldCacheImpl.java:230 ) Looking the source code in 3.6.1, that is a function call to syncronized() which blocks all threads and causes the backlog. I've tried to correlate these events to the replication events - but even with replication disabled - this still happens. We run multiple data centers using Solr and I was comparing garbage collection processes between and noted that the old generation is collected very differently on this data center versus others. The old generation is collected as a massive collect event (several gigabytes worth) - the other data center is more saw toothed and collects only in 500MB-1GB at a time. Here's my parameters to java (the same in all environments): /usr/java/jre/bin/java \ -verbose:gc \ -XX:+PrintGCDetails \ -server \ -Dcom.sun.management.jmxremote \ -XX:+UseConcMarkSweepGC \ -XX:+UseParNewGC \ -XX:+CMSIncrementalMode \ -XX:+CMSParallelRemarkEnabled \ -XX:+CMSIncrementalPacing \ -XX:NewRatio=3 \ -Xms30720M \ -Xmx30720M \ -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ -classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \ -Dcatalina.base=/usr/local/share/apache-tomcat \ -Dcatalina.home=/usr/local/share/apache-tomcat \ -Djava.io.tmpdir=/tmp \ org.apache.catalina.startup.Bootstrap start I've tried a few GC option changes from this (been running this way for a couple of years now) - primarily removing CMS Incremental mode as we have 8 cores and remarks on the internet suggest that it is only for smaller SMP setups. Removing CMS did not fix anything. I've considered that the heap is way too large (30GB from 40GB) and may not leave enough memory for mmap operations (MMap appears to be used in the field cache). Based on active memory utilization in Java, seems like I might be able to reduce down to 22GB safely - but I'm not sure if that will help with the CPU issues. I think field cache is used for sorting and faceting. I've started to investigate facet.method, but from what I can tell, this doesn't seem to influence sorting at all - only facet queries. I've tried setting useFilterForSortQuery, and seems to require less field cache but doesn't address the stalling issues. Is there something I am overlooking? Perhaps the system is becoming oversubscribed in terms of resources? Thanks for any help that is offered. -- Patrick O'Lone Director of Software Development TownNews.com E-mail ... pol...@townnews.com Phone 309-743-0809 Fax .. 309-743-0830
RE: Solr 3.6.1 stalling with high CPU and blocking on field cache
My gut instinct is that your heap size is way too high. Try decreasing it to like 5-10G. I know you say it uses more than that, but that just seems bizarre unless you're doing something like faceting and/or sorting on every field. -Michael -Original Message- From: Patrick O'Lone [mailto:pol...@townnews.com] Sent: Tuesday, November 26, 2013 11:59 AM To: solr-user@lucene.apache.org Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache I've been tracking a problem in our Solr environment for awhile with periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I might get some insight from some others on this list. The load on the server is normally anywhere between 1-3. It's an 8-core machine with 40GB of RAM. I have about 25GB of index data that is replicated to this server every 5 minutes. It's taking about 200 connections per second and roughly every 5-10 minutes it will stall for about 30 seconds to a minute. The stall causes the load to go to as high as 90. It is all CPU bound in user space - all cores go to 99% utilization (spinlock?). When doing a thread dump, the following line is blocked in all running Tomcat threads: org.apache.lucene.search.FieldCacheImpl$Cache.get ( FieldCacheImpl.java:230 ) Looking the source code in 3.6.1, that is a function call to syncronized() which blocks all threads and causes the backlog. I've tried to correlate these events to the replication events - but even with replication disabled - this still happens. We run multiple data centers using Solr and I was comparing garbage collection processes between and noted that the old generation is collected very differently on this data center versus others. The old generation is collected as a massive collect event (several gigabytes worth) - the other data center is more saw toothed and collects only in 500MB-1GB at a time. Here's my parameters to java (the same in all environments): /usr/java/jre/bin/java \ -verbose:gc \ -XX:+PrintGCDetails \ -server \ -Dcom.sun.management.jmxremote \ -XX:+UseConcMarkSweepGC \ -XX:+UseParNewGC \ -XX:+CMSIncrementalMode \ -XX:+CMSParallelRemarkEnabled \ -XX:+CMSIncrementalPacing \ -XX:NewRatio=3 \ -Xms30720M \ -Xmx30720M \ -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ -classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \ -Dcatalina.base=/usr/local/share/apache-tomcat \ -Dcatalina.home=/usr/local/share/apache-tomcat \ -Djava.io.tmpdir=/tmp \ org.apache.catalina.startup.Bootstrap start I've tried a few GC option changes from this (been running this way for a couple of years now) - primarily removing CMS Incremental mode as we have 8 cores and remarks on the internet suggest that it is only for smaller SMP setups. Removing CMS did not fix anything. I've considered that the heap is way too large (30GB from 40GB) and may not leave enough memory for mmap operations (MMap appears to be used in the field cache). Based on active memory utilization in Java, seems like I might be able to reduce down to 22GB safely - but I'm not sure if that will help with the CPU issues. I think field cache is used for sorting and faceting. I've started to investigate facet.method, but from what I can tell, this doesn't seem to influence sorting at all - only facet queries. I've tried setting useFilterForSortQuery, and seems to require less field cache but doesn't address the stalling issues. Is there something I am overlooking? Perhaps the system is becoming oversubscribed in terms of resources? Thanks for any help that is offered. -- Patrick O'Lone Director of Software Development TownNews.com E-mail ... pol...@townnews.com Phone 309-743-0809 Fax .. 309-743-0830
Re: Client-side proxy for Solr 4.5.0
Perhaps what you want is a transparent proxy? You could use nginx, squid, varnish, etc. W've been evaluating varnish as a posibility to run in front of our solr server and take advantage of the HTTP caching that varnish does so well. Greetings! - Mensaje original - De: Markus Jelsma markus.jel...@openindex.io Para: solr-user@lucene.apache.org Enviados: Martes, 26 de Noviembre 2013 13:53:31 Asunto: RE: Client-side proxy for Solr 4.5.0 I don't think you mean client-side proxy. You need a server side layer such as a normal web application or good proxy. We use Nginx, it is very fast and very feature rich. Its config scripting is usually enough to restrict access and limit input parameters. We also use Nginx's embedded Perl and Lua scripting besides its config scripting to implement more difficult logic. -Original message- From:Reyes, Mark mark.re...@bpiedu.com Sent: Tuesday 26th November 2013 19:27 To: solr-user@lucene.apache.org Subject: Client-side proxy for Solr 4.5.0 Are there any GOOD client-side solutions to proxy a Solr 4.5.0 instance so that the end-user can see their queries w/o being able to directly access :8983? Applications/frameworks used: - Solr 4.5.0 - AJAX Solr (javascript library) Thank you, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments. III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: Solr 3.6.1 stalling with high CPU and blocking on field cache
We do perform a lot of sorting - on multiple fields in fact. We have different kinds of Solr configurations - our news searches do little with regards to faceting, but heavily sort. We provide classified ad searches and that heavily uses faceting. I might try reducing the JVM memory some and amount of perm generation as suggested earlier. It feels like a GC issue and loading the cache just happens to be the victim of a stop-the-world event at the worse possible time. My gut instinct is that your heap size is way too high. Try decreasing it to like 5-10G. I know you say it uses more than that, but that just seems bizarre unless you're doing something like faceting and/or sorting on every field. -Michael -Original Message- From: Patrick O'Lone [mailto:pol...@townnews.com] Sent: Tuesday, November 26, 2013 11:59 AM To: solr-user@lucene.apache.org Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache I've been tracking a problem in our Solr environment for awhile with periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I might get some insight from some others on this list. The load on the server is normally anywhere between 1-3. It's an 8-core machine with 40GB of RAM. I have about 25GB of index data that is replicated to this server every 5 minutes. It's taking about 200 connections per second and roughly every 5-10 minutes it will stall for about 30 seconds to a minute. The stall causes the load to go to as high as 90. It is all CPU bound in user space - all cores go to 99% utilization (spinlock?). When doing a thread dump, the following line is blocked in all running Tomcat threads: org.apache.lucene.search.FieldCacheImpl$Cache.get ( FieldCacheImpl.java:230 ) Looking the source code in 3.6.1, that is a function call to syncronized() which blocks all threads and causes the backlog. I've tried to correlate these events to the replication events - but even with replication disabled - this still happens. We run multiple data centers using Solr and I was comparing garbage collection processes between and noted that the old generation is collected very differently on this data center versus others. The old generation is collected as a massive collect event (several gigabytes worth) - the other data center is more saw toothed and collects only in 500MB-1GB at a time. Here's my parameters to java (the same in all environments): /usr/java/jre/bin/java \ -verbose:gc \ -XX:+PrintGCDetails \ -server \ -Dcom.sun.management.jmxremote \ -XX:+UseConcMarkSweepGC \ -XX:+UseParNewGC \ -XX:+CMSIncrementalMode \ -XX:+CMSParallelRemarkEnabled \ -XX:+CMSIncrementalPacing \ -XX:NewRatio=3 \ -Xms30720M \ -Xmx30720M \ -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ -classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \ -Dcatalina.base=/usr/local/share/apache-tomcat \ -Dcatalina.home=/usr/local/share/apache-tomcat \ -Djava.io.tmpdir=/tmp \ org.apache.catalina.startup.Bootstrap start I've tried a few GC option changes from this (been running this way for a couple of years now) - primarily removing CMS Incremental mode as we have 8 cores and remarks on the internet suggest that it is only for smaller SMP setups. Removing CMS did not fix anything. I've considered that the heap is way too large (30GB from 40GB) and may not leave enough memory for mmap operations (MMap appears to be used in the field cache). Based on active memory utilization in Java, seems like I might be able to reduce down to 22GB safely - but I'm not sure if that will help with the CPU issues. I think field cache is used for sorting and faceting. I've started to investigate facet.method, but from what I can tell, this doesn't seem to influence sorting at all - only facet queries. I've tried setting useFilterForSortQuery, and seems to require less field cache but doesn't address the stalling issues. Is there something I am overlooking? Perhaps the system is becoming oversubscribed in terms of resources? Thanks for any help that is offered. -- Patrick O'Lone Director of Software Development TownNews.com E-mail ... pol...@townnews.com Phone 309-743-0809 Fax .. 309-743-0830 -- Patrick O'Lone Director of Software Development TownNews.com E-mail ... pol...@townnews.com Phone 309-743-0809 Fax .. 309-743-0830
Re: How to request not directly my SOLR server ?
Le 26/11/2013 18:52, Shawn Heisey a écrit : On 11/26/2013 8:37 AM, Bruno Mannina wrote: I show my SOLR server to a friend and its first question was: You can request directly your solr database from your internet explorer?! is it not a security problem? each person which has your request link can use your database directly? So I ask the question here. I protect my admin panel but is it possible to protect a direct request ? Don't make your Solr server directly accessible from the Internet. Only make it accessible from the machines that serve your website and whoever needs to administer it. Solr has no security features. You can use the security features in whatever container is running Solr, but that is outside the scope of this mailing list. Thanks, Shawn Thanks a lot for this information, Bruno --- Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active. http://www.avast.com
Re: Major GC does not reduce the old gen size
psoting on behalf of neoman, alt_schema.xml http://lucene.472066.n3.nabble.com/file/n4103405/alt_schema.xml details about index Number of documents: 350 million number of shards: 4 number of nodes:8 replicatorFactor: 1(default/no additional replication). Total RAM on each server node: 16 GB number of documents per shard: close to 80 million. documents gets added to index every 15 mins in a batch job and yes, as you mentioned updates to documents is also happening -- View this message in context: http://lucene.472066.n3.nabble.com/Major-GC-does-not-reduce-the-old-gen-size-tp4096880p4103405.html Sent from the Solr - User mailing list archive at Nabble.com.
Is it possible to have only fq in my solr query?
Hi, I am preparing a solr query. in that i am only giving fq parameter .. I dont give any q parameter.. If i exeucte such query, where only it is having fq, it is not returning any docs. in the sense it is returning 0 docs. So, is it always mandatory to have q parameter in solr query? if so, then i think i should have something like q=*:* and fq=field:value Please explain Thanks Radha -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-have-only-fq-in-my-solr-query-tp4103429.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding new field after data is already indexed
Check Solr: Add new fields with Default Value for Existing Documents http://lifelongprogrammer.blogspot.com/2013/06/solr-use-doctransformer-to-change.html If we only need search and display the new fields, we can do the following steps. 1. add the new field definition in schema.xml: field name=newFiled type=tint indexed=true stored=true default=-1/ 2. We need update search query: when search default value for this newFiled, also search null value: -(-newFiled:defaultValue AND newFiled:[* TO *]) 3. Use DocTransformer to add default value when there is no value in that field for old data. Some functions may not work such as sort, stats. -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-new-field-after-data-is-already-indexed-tp1862575p4103440.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: In a functon query, I can't get the ValueSource when extend ValueSourceParser
Thank you, kydryavtsev andrey! You give me the right solution. -- View this message in context: http://lucene.472066.n3.nabble.com/In-a-functon-query-I-can-t-get-the-ValueSource-when-extend-ValueSourceParser-tp4103026p4103449.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Storing solr results in excel
Thank you for suggestion Finally i got the solution Converted the document into json format and store them in a string String url = JSONUtil.toJSON(document.get(url)); then i placed string values in excel file -- View this message in context: http://lucene.472066.n3.nabble.com/Storing-solr-results-in-excel-tp4103237p4103450.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Autowarmed queries on jvm crash
Hi, What happens to the autowarmed queries if the servers is shutdown / jvm crashes. Is there any possibility to recover that from the physical storage ( trasaction log?) Thanks, Sinduja
Re: Solr Autowarmed queries on jvm crash
On 11/26/2013 11:15 PM, Prasi S wrote: What happens to the autowarmed queries if the servers is shutdown / jvm crashes. Is there any possibility to recover that from the physical storage ( trasaction log?) The transaction log only contains data that is sent to Solr for indexing. Cached query data is lost when the program exits, so it cannot be used for autowarming. If the logs are set to at least INFO severity, they will contain a query history, but Solr doesn't have any way to pull those back out of the logfile and re-use them. If firstSearcher and/or newSearcher warming queries are defined in solrconfig.xml, then those will be re-done when Solr starts back up. Thanks, Shawn
Re: Solr Autowarmed queries on jvm crash
Thanks Shawn for the reply. In that case, when the system is restarted, a new searcher would be opened? It cannot populate from its previous searchers? I may be wrong here, but i wanted to confirm. Thanks, Prasi On Wed, Nov 27, 2013 at 12:04 PM, Shawn Heisey s...@elyograg.org wrote: On 11/26/2013 11:15 PM, Prasi S wrote: What happens to the autowarmed queries if the servers is shutdown / jvm crashes. Is there any possibility to recover that from the physical storage ( trasaction log?) The transaction log only contains data that is sent to Solr for indexing. Cached query data is lost when the program exits, so it cannot be used for autowarming. If the logs are set to at least INFO severity, they will contain a query history, but Solr doesn't have any way to pull those back out of the logfile and re-use them. If firstSearcher and/or newSearcher warming queries are defined in solrconfig.xml, then those will be re-done when Solr starts back up. Thanks, Shawn
Re: syncronization between replicas
anyone? -- View this message in context: http://lucene.472066.n3.nabble.com/syncronization-between-replicas-tp4103046p4103455.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Autowarmed queries on jvm crash
On 11/26/2013 11:49 PM, Prasi S wrote: Thanks Shawn for the reply. In that case, when the system is restarted, a new searcher would be opened? It cannot populate from its previous searchers? I may be wrong here, but i wanted to confirm. There are no previous searchers when Solr first starts up. At startup, any queries defined as part of the firstSearcher event are executed. Each time a new searcher is created, any queries defined as part of the newSearcher event are executed. Thanks, Shawn
Re: Solr Autowarmed queries on jvm crash
Ok. i have started solr for the first time and have autowarmed few queries. Now my jvm crashes due to some other reason . Then i restart solr. What would happen to the autowarmed queries , cache , old searcher now. Thanks, Prasi On Wed, Nov 27, 2013 at 12:32 PM, Shawn Heisey s...@elyograg.org wrote: On 11/26/2013 11:49 PM, Prasi S wrote: Thanks Shawn for the reply. In that case, when the system is restarted, a new searcher would be opened? It cannot populate from its previous searchers? I may be wrong here, but i wanted to confirm. There are no previous searchers when Solr first starts up. At startup, any queries defined as part of the firstSearcher event are executed. Each time a new searcher is created, any queries defined as part of the newSearcher event are executed. Thanks, Shawn