unknown _stream_source_info while indexing rich doc in solr
i am using solr4.2 on windows7 my schema is: field name=id type=string indexed=true stored=true required=true/ field name=author type=string indexed=true stored=true multiValued=true/ field name=comments type=text indexed=true stored=true multiValued=false/ field name=keywords type=text indexed=true stored=true multiValued=false/ field name=contents type=text indexed=true stored=true multiValued=false/ field name=title type=text indexed=true stored=true multiValued=false/ field name=revision_number type=string indexed=true stored=true multiValued=false/ dynamicField name=ignored_* type=ignored indexed=false stored= falsemultiValued=true/ solrconfig.xml : requestHandler name=/update/extract class=org.apache.solr.handler. extraction.ExtractingRequestHandler lst name=defaults str name=fmap.contentcontents/str str name=lowernamestrue/str str name=uprefixignored_/str str name=captureAttrtrue/str /lst /requestHandler when i execute: curl http://localhost:8080/solr/update/extract?literal.id=1commit=true; -F myfile=@abc.txt i get error:unknown field ignored_stream_ source_info. i referred solr cookbook3.1 and solrcookbook4 but error is not resolved please help me. -- View this message in context: http://lucene.472066.n3.nabble.com/unknown-stream-source-info-while-indexing-rich-doc-in-solr-tp4088136.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dataimporter tika doesn't extract certain div
so could i just nest it in a XPathEntityProcessor to filter the html or is there something like xpath for tika? entity name=htm processor=XPathEntityProcessor url=${rec.file} forEach=/div[@id='content'] dataSource=main entity name=tika processor=TikaEntityProcessor url=${htm} dataSource=dataUrl onError=skip htmlMapper=identity format=html field column=text / /entity /entity but now i dont know how to pass the text to tika, what do i put in url and datasource? On 3. Sep 2013, at 5:56 PM, Shalin Shekhar Mangar wrote: I don't know much about Tika but in the example data-config.xml that you posted, the xpath attribute on the field text won't work because the xpath attribute is used only by a XPathEntityProcessor. On Thu, Aug 29, 2013 at 10:20 PM, Andreas Owen a...@conx.ch wrote: I want tika to only index the content in div id=content.../div for the field text. unfortunately it's indexing the hole page. Can't xpath do this? data-config.xml: dataConfig dataSource type=BinFileDataSource name=data/ dataSource type=BinURLDataSource name=dataUrl/ dataSource type=URLDataSource name=main/ document entity name=rec processor=XPathEntityProcessor url=http://127.0.0.1/tkb/internet/docImportUrl.xml; forEach=/docs/doc dataSource=main !--transformer=script:GenerateId-- field column=title xpath=//title / field column=id xpath=//id / field column=file xpath=//file / field column=path xpath=//path / field column=url xpath=//url / field column=Author xpath=//author / entity name=tika processor=TikaEntityProcessor url=${rec.path}${rec.file} dataSource=dataUrl onError=skip htmlMapper=identity format=html field column=text xpath=//div[@id='content'] / /entity /entity /document /dataConfig -- Regards, Shalin Shekhar Mangar.
Re: DIH + Solr Cloud
Hey Alejandro, I guess it means what you call more than one instance. The request handlers are at the core-level, and not the Solr instance/global level, and within each of those cores you could have one or more data import handlers. Most setups have 1 DIH per core at the handler location /dataimport, but I believe you could have several, ie: /dataimport2, /dataimport3 if you had different DIH configs for each handler. Within a single data import handler, you can have several entities, which are what explain to the DIH processes how to get/index the data. What you can do here is have several entities that construct your index, and execute those entities with several separate HTTP calls to the DIH, thus creating more than one instance of the DIH process within 1 core and 1 DIH handler. ie: curl http://localhost:8983/solr/core1/dataimport?command=full-importentity=suppliers; curl http://localhost:8983/solr/core1/dataimport?command=full-importentity=parts; curl http://localhost:8983/solr/core1/dataimport?command=full-importentity=companies; http://wiki.apache.org/solr/DataImportHandler#Commands Cheers, Tim On 03/09/13 09:25 AM, Alejandro Calbazana wrote: Hi, Quick question about data import handlers in Solr cloud. Does anyone use more than one instance to support the DIH process? Or is the typical setup to have one box setup as only the DIH and keep this responsibility outside of the Solr cloud environment? I'm just trying to get picture of his this is typically deployed. Thanks! Alejandro
Re: Change the score of a document based on the *value* of a multifield using dismax
Thanks a lot David. I will try it ;) -- View this message in context: http://lucene.472066.n3.nabble.com/Change-the-score-of-a-document-based-on-the-value-of-a-multifield-tp4087503p4088145.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dataimporter tika doesn't extract certain div
No that wouldn't work. It seems that you probably need a custom Transformer to extract the right div content. I do not know if TikaEntityProcessor supports such a thing. On Wed, Sep 4, 2013 at 12:38 PM, Andreas Owen a...@conx.ch wrote: so could i just nest it in a XPathEntityProcessor to filter the html or is there something like xpath for tika? entity name=htm processor=XPathEntityProcessor url=${rec.file} forEach=/div[@id='content'] dataSource=main entity name=tika processor=TikaEntityProcessor url=${htm} dataSource=dataUrl onError=skip htmlMapper=identity format=html field column=text / /entity /entity but now i dont know how to pass the text to tika, what do i put in url and datasource? On 3. Sep 2013, at 5:56 PM, Shalin Shekhar Mangar wrote: I don't know much about Tika but in the example data-config.xml that you posted, the xpath attribute on the field text won't work because the xpath attribute is used only by a XPathEntityProcessor. On Thu, Aug 29, 2013 at 10:20 PM, Andreas Owen a...@conx.ch wrote: I want tika to only index the content in div id=content.../div for the field text. unfortunately it's indexing the hole page. Can't xpath do this? data-config.xml: dataConfig dataSource type=BinFileDataSource name=data/ dataSource type=BinURLDataSource name=dataUrl/ dataSource type=URLDataSource name=main/ document entity name=rec processor=XPathEntityProcessor url=http://127.0.0.1/tkb/internet/docImportUrl.xml; forEach=/docs/doc dataSource=main !--transformer=script:GenerateId-- field column=title xpath=//title / field column=id xpath=//id / field column=file xpath=//file / field column=path xpath=//path / field column=url xpath=//url / field column=Author xpath=//author / entity name=tika processor=TikaEntityProcessor url=${rec.path}${rec.file} dataSource=dataUrl onError=skip htmlMapper=identity format=html field column=text xpath=//div[@id='content'] / /entity /entity /document /dataConfig -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: Measuring SOLR performance
Hi Roman, Ok, I will. Thanks! Cheers, Dmitry On Tue, Sep 3, 2013 at 4:46 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, Thanks for the feedback. Yes, it is indeed jmeter issue (or rather, the issue of the plugin we use to generate charts). You may want to use the github for whatever comes next https://github.com/romanchyla/solrjmeter/issues Cheers, roman On Tue, Sep 3, 2013 at 7:54 AM, Dmitry Kan solrexp...@gmail.com wrote: Hi Roman, Thanks, the --additionalSolrParams was just what I wanted and works fine. BTW, if you have some special bug tracking forum for the tool, I'm happy to submit questions / bug reports there. Otherwise, this email list is ok (for me at least). One other thing I have noticed in the err logs was a series of messages of this sort upon generating the perf test report. Seems to be jmeter related (the err messages disappear, if extra lib dir is present under ext directory). java.lang.Throwable: Could not access /home/dmitry/projects/lab/solrjmeter7/solrjmeter/jmeter/lib/ext/lib at kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109) at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55) at kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109) at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55) at kg.apc.cmd.UniversalRunner.buildUpdatedClassPath(UniversalRunner.java:109) at kg.apc.cmd.UniversalRunner.clinit(UniversalRunner.java:55) On Tue, Sep 3, 2013 at 2:50 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, If it is something you want to pass with every request (which is my use case), you can pass it as additional solr params, eg. python solrjmeter --additionalSolrParams=fq=other_field:bar+facet=true+facet.field=facet_field_name the string should be url encoded. If it is something that changes with every request, you should modify the jmeter test. If you open/load it with jmeter GUI, in the HTTP request processor you can define other additional fields to pass with the request. These values can come from the CSV file, you'll see an example how to use that when you open the test difinition file. Cheers, roman On Mon, Sep 2, 2013 at 3:12 PM, Dmitry Kan solrexp...@gmail.com wrote: Hi Erick, Agree, this is perfectly fine to mix them in solr. But my question is about solrjmeter input query format. Just couldn't find a suitable example on the solrjmeter's github. Dmitry On Mon, Sep 2, 2013 at 5:40 PM, Erick Erickson erickerick...@gmail.com wrote: filter and facet queries can be freely intermixed, it's not a problem. What problem are you seeing when you try this? Best, Erick On Mon, Sep 2, 2013 at 7:46 AM, Dmitry Kan solrexp...@gmail.com wrote: Hi Roman, What's the format for running the facet+filter queries? Would something like this work: field:foo =50 fq=other_field:bar facet=true facet.field=facet_field_name Thanks, Dmitry On Fri, Aug 23, 2013 at 2:34 PM, Dmitry Kan solrexp...@gmail.com wrote: Hi Roman, With adminPath=/admin or adminPath=/admin/cores, no. Interestingly enough, though, I can access http://localhost:8983/solr/statements/admin/system But I can access http://localhost:8983/solr/admin/cores, only when with adminPath=/admin/cores (which suggests that this is the right value to be used for cores), and not with adminPath=/admin. Bottom line, these core configuration is not self-evident. Dmitry On Fri, Aug 23, 2013 at 4:18 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi Dmitry, So it seems solrjmeter should not assume the adminPath - and perhaps needs to be passed as an argument. When you set the adminPath, are you able to access localhost:8983/solr/statements/admin/cores ? roman On Wed, Aug 21, 2013 at 7:36 AM, Dmitry Kan solrexp...@gmail.com wrote: Hi Roman, I have noticed a difference with different solr.xml config contents. It is probably legit, but thought to let you know (tests run on fresh checkout as of today). As mentioned before, I have two cores configured in solr.xml. If the file is: [code] solr persistent=false !-- adminPath: RequestHandler path to manage cores. If 'null' (or absent), cores will not be manageable via
Strange behaviour with single word and phrase
I wonder if anyone could point me in the right direction please? If I search on the phrase the toolkit I get hits containing that phrase but also hits that have the word 'the' before the word 'toolkit', no matter how far apart they are. Also, if I search on the word 'the' there are no hits at all. Thanks, Alistair - mov eax,1 mov ebx,0 int 80
Re: Starting Solr in Tomcat with specifying ZK host(s)
Thanks Shawn! Indeed, setting the JAVA_OPTS and restarting Tomcat did the trick. Currently I'm exploring and experimenting with SolrCloud, thus I only used only one ZK. For a production environment you suggestion would, of course, be mandatory. -- View this message in context: http://lucene.472066.n3.nabble.com/Starting-Solr-in-Tomcat-with-specifying-ZK-host-s-tp4087916p4088164.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing pdf files - question.
My solrconfig.xml is: requestHandler name=/update/extract class=solr.extraction.ExtractingRequestHandler lst name=defaults str name=fmap.contentdesc/str !-to map this field of my table which is defined as shown below in schem.xml-- str name=lowernamestrue/str str name=uprefixattr_/str str name=captureAttrtrue/str /lst /requestHandler lib dir=../../extract regex=.*\.jar / Schema.xml: fields field name=doc_id type=integer indexed=true stored=true multiValued=false/ field name=name type=text indexed=true stored=true multiValued=false/ field name=path type=text indexed=true stored=true multiValued=false/ field name=desc type=text_split indexed=true stored=true multiValued=false/ /fields types fieldType name=string class=solr.StrField / fieldType name=integer class=solr.IntField / fieldType name=text class=solr.TextField / fieldType name=text class=solr.TextField / /types dynamicField name=*_i type=integer indexed=true stored=true/ uniqueKeydoc_id/uniqueKey I have created extract directory and copied all required .jar and solr-cell jar files into this extract directory and given its path in lib tag in solrconfig.xml When I try out this: curl http://localhost:8080/solr/update/extract?literal.doc_id=1commit=true; -F myfile=@solr-word.pdf mailto:myfile=@solr-word.pdf in Windows 7. I get /solr/update/extract is not available and sometimes I get access denied error. I tried resolving through net,but in vain.as all the solutions are related to linux os,im working on Windows. Please help me and provide solutions related o Windows os. I referred Apache_solr_4_Cookbook. Thanks a lot.
solr performance against oracle
Hi, I´m trying to change the data access in the company where I work from Oracle to Solr. Then I make some test, like this: In Oracle: private void go() throws Exception { Class.forName(oracle.jdbc.driver.OracleDriver); Connection conn = DriverManager.getConnection(XXX); PreparedStatement pstmt = conn.prepareStatement(SELECT DS_ROTEIRO FROM cco_roteiro_reg_venda WHERE CD_ROTEIRO=93100689); Date initialTime = new Date(); ResultSet rs = pstmt.executeQuery(); rs.next(); String desc = rs.getString(1); System.out.println(total time: + (new Date().getTime()-initialTime.getTime()) + ms); System.out.println(desc); rs.close(); pstmt.close(); conn.close(); } And in Solr: private void go() throws Exception { String baseUrl = http://localhost:8983/solr/;; this.solrServerUrl = http://localhost:8983/solr/roteiros/;; server = new HttpSolrServer(solrUrl); String docId = AddOneRoteiroToCollection.docId; HttpSolrServer solr = new HttpSolrServer(baseUrl); SolrServer solrServer = new HttpSolrServer(solrServerUrl); solr.setRequestWriter(new BinaryRequestWriter()); SolrQuery query = new SolrQuery(); query.setQuery((id: + docId + )); // search by id query.addField(id); query.addField(descricaoRoteiro); extrairEApresentarResultados(query); } private void extrairEApresentarResultados(SolrQuery query) throws SolrServerException { Date initialTime = new Date(); QueryResponse rsp = server.query( query ); SolrDocumentList docs = rsp.getResults(); long now = new Date().getTime()-initialTime.getTime(); // HERE I CHECHING THE SOLR RESPONSE TIME for (SolrDocument solrDocument : docs) { System.out.println(solrDocument); } System.out.println(Total de documentos encontrados: + docs.size()); System.out.println(Tempo total: + now + ms); } descricaoRoteiro is the same data that I´m getting in both, using the PK CD_ROTEIRO that´s in Solr with name id (it´s the same data). Solr data is the same machine, and Solr And Oracle have the same number of records (arround 800 thousands). Solr aways returns the data arround 150~200 ms (from localhost), but Oracle returns arround 20 ms (and Oracle server is in another company, I´m using dedicated link to access it). How can I tell to my managers that I´d like to use Solr? I saw that filters in Solr taks arround 6~10 ms, but they´re a query inside another query that´s returned previosly. Thanks for any help. I´d like so much to use Solr, but I really don´t know to explain this to my managers. -- Sergio Stateri Jr. stat...@gmail.com
Re: solr performance against oracle
You said nothing about your enviroments (e.g. operating systems, what kind of Oracle installation you have, whar kind of SOLR installation, how many data in database, how many documents in index, RAM for SOLR, for Oracle, for OS, and in general hardware...and so on)... Anyway...a migration from Oracle to SOLR? That is, you're going to throw out the window Oracle and completely replace it with SOLR? I would consider other aspects first before your performace test...unless you have one flat table in Oracle, you should explain to your manager that there's a lot work that needs to be done for that kind of migration (e.g. collect all query requirements, denormalization) Best, Gazza On 09/04/2013 02:06 PM, Sergio Stateri wrote: Hi, I´m trying to change the data access in the company where I work from Oracle to Solr. Then I make some test, like this: In Oracle: private void go() throws Exception { Class.forName(oracle.jdbc.driver.OracleDriver); Connection conn = DriverManager.getConnection(XXX); PreparedStatement pstmt = conn.prepareStatement(SELECT DS_ROTEIRO FROM cco_roteiro_reg_venda WHERE CD_ROTEIRO=93100689); Date initialTime = new Date(); ResultSet rs = pstmt.executeQuery(); rs.next(); String desc = rs.getString(1); System.out.println(total time: + (new Date().getTime()-initialTime.getTime()) + ms); System.out.println(desc); rs.close(); pstmt.close(); conn.close(); } And in Solr: private void go() throws Exception { String baseUrl = http://localhost:8983/solr/;; this.solrServerUrl = http://localhost:8983/solr/roteiros/;; server = new HttpSolrServer(solrUrl); String docId = AddOneRoteiroToCollection.docId; HttpSolrServer solr = new HttpSolrServer(baseUrl); SolrServer solrServer = new HttpSolrServer(solrServerUrl); solr.setRequestWriter(new BinaryRequestWriter()); SolrQuery query = new SolrQuery(); query.setQuery((id: + docId + )); // search by id query.addField(id); query.addField(descricaoRoteiro); extrairEApresentarResultados(query); } private void extrairEApresentarResultados(SolrQuery query) throws SolrServerException { Date initialTime = new Date(); QueryResponse rsp = server.query( query ); SolrDocumentList docs = rsp.getResults(); long now = new Date().getTime()-initialTime.getTime(); // HERE I CHECHING THE SOLR RESPONSE TIME for (SolrDocument solrDocument : docs) { System.out.println(solrDocument); } System.out.println(Total de documentos encontrados: + docs.size()); System.out.println(Tempo total: + now + ms); } descricaoRoteiro is the same data that I´m getting in both, using the PK CD_ROTEIRO that´s in Solr with name id (it´s the same data). Solr data is the same machine, and Solr And Oracle have the same number of records (arround 800 thousands). Solr aways returns the data arround 150~200 ms (from localhost), but Oracle returns arround 20 ms (and Oracle server is in another company, I´m using dedicated link to access it). How can I tell to my managers that I´d like to use Solr? I saw that filters in Solr taks arround 6~10 ms, but they´re a query inside another query that´s returned previosly. Thanks for any help. I´d like so much to use Solr, but I really don´t know to explain this to my managers.
Re: Strange behaviour with single word and phrase
Do you have stop word filtering enabled? What does your field type look like? If stop words are ignored, you will get exactly the behavior you described. -- Jack Krupansky -Original Message- From: Alistair Young Sent: Wednesday, September 04, 2013 6:57 AM To: solr-user@lucene.apache.org Subject: Strange behaviour with single word and phrase I wonder if anyone could point me in the right direction please? If I search on the phrase the toolkit I get hits containing that phrase but also hits that have the word 'the' before the word 'toolkit', no matter how far apart they are. Also, if I search on the word 'the' there are no hits at all. Thanks, Alistair - mov eax,1 mov ebx,0 int 80
Re: unknown _stream_source_info while indexing rich doc in solr
Did you restart Solr after editing config and schema? -- Jack Krupansky -Original Message- From: Nutan Sent: Wednesday, September 04, 2013 3:07 AM To: solr-user@lucene.apache.org Subject: unknown _stream_source_info while indexing rich doc in solr i am using solr4.2 on windows7 my schema is: field name=id type=string indexed=true stored=true required=true/ field name=author type=string indexed=true stored=true multiValued=true/ field name=comments type=text indexed=true stored=true multiValued=false/ field name=keywords type=text indexed=true stored=true multiValued=false/ field name=contents type=text indexed=true stored=true multiValued=false/ field name=title type=text indexed=true stored=true multiValued=false/ field name=revision_number type=string indexed=true stored=true multiValued=false/ dynamicField name=ignored_* type=ignored indexed=false stored= falsemultiValued=true/ solrconfig.xml : requestHandler name=/update/extract class=org.apache.solr.handler. extraction.ExtractingRequestHandler lst name=defaults str name=fmap.contentcontents/str str name=lowernamestrue/str str name=uprefixignored_/str str name=captureAttrtrue/str /lst /requestHandler when i execute: curl http://localhost:8080/solr/update/extract?literal.id=1commit=true; -F myfile=@abc.txt i get error:unknown field ignored_stream_ source_info. i referred solr cookbook3.1 and solrcookbook4 but error is not resolved please help me. -- View this message in context: http://lucene.472066.n3.nabble.com/unknown-stream-source-info-while-indexing-rich-doc-in-solr-tp4088136.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr Cloud hangs when replicating updates
Kevin, Take a look at http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html and https://issues.apache.org/jira/browse/SOLR-4816. I had the same issue that you're reporting for a while then I applied the patch from SOLR-4816 to my clients and the problems went away. If you don't feel like applying the patch it looks like it should be included in the release of version 4.5. Also note that the problem happens more frequently when the replication factor is greater than 1. Thanks, Greg -Original Message- From: kevin.osb...@cbsinteractive.com [mailto:kevin.osb...@cbsinteractive.com] On Behalf Of Kevin Osborn Sent: Tuesday, September 03, 2013 4:16 PM To: solr-user Subject: Solr Cloud hangs when replicating updates I was having problems updating SolrCloud with a large batch of records. The records are coming in bursts with lulls between updates. At first, I just tried large updates of 100,000 records at a time. Eventually, this caused Solr to hang. When hung, I can still query Solr. But I cannot do any deletes or other updates to the index. At first, my updates were going as SolrJ CSV posts. I have also tried local file updates and had similar results. I finally slowed things down to just use SolrJ's Update feature, which is basically just JavaBin. I am also sending over just 100 at a time in 10 threads. Again, it eventually hung. Sometimes, Solr hangs in the first couple of chunks. Other times, it hangs right away. These are my commit settings: autoCommit maxTime15000/maxTime maxDocs5000/maxDocs openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime3/maxTime /autoSoftCommit I have tried quite a few variations with the same results. I also tried various JVM settings with the same results. The only variable seems to be that reducing the cluster size from 2 to 1 is the only thing that helps. I also did a jstack trace. I did not see any explicit deadlocks, but I did see quite a few threads in WAITING or TIMED_WAITING. It is typically something like this: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00074039a450 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474) at org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395) at org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44) at org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364) at org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at
RE: SolrCloud 4.x hangs under high update volume
Tim, Take a look at http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.html and https://issues.apache.org/jira/browse/SOLR-4816. I had the same issue that you're reporting for a while then I applied the patch from SOLR-4816 to my clients and the problems went away. If you don't feel like applying the patch it looks like it should be included in the release of version 4.5. Also note that the problem happens more frequently when the replication factor is greater than 1. Thanks, Greg -Original Message- From: Tim Vaillancourt [mailto:t...@elementspace.com] Sent: Tuesday, September 03, 2013 6:31 PM To: solr-user@lucene.apache.org Subject: SolrCloud 4.x hangs under high update volume Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96) at org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462) at org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)
Re: Strange behaviour with single word and phrase
Yep ignoring stop words. Thanks for the pointer. Alistair - mov eax,1 mov ebx,0 int 80 On 04/09/2013 13:43, Jack Krupansky j...@basetechnology.com wrote: Do you have stop word filtering enabled? What does your field type look like? If stop words are ignored, you will get exactly the behavior you described. -- Jack Krupansky -Original Message- From: Alistair Young Sent: Wednesday, September 04, 2013 6:57 AM To: solr-user@lucene.apache.org Subject: Strange behaviour with single word and phrase I wonder if anyone could point me in the right direction please? If I search on the phrase the toolkit I get hits containing that phrase but also hits that have the word 'the' before the word 'toolkit', no matter how far apart they are. Also, if I search on the word 'the' there are no hits at all. Thanks, Alistair - mov eax,1 mov ebx,0 int 80
Re: dataimporter tika doesn't extract certain div
or could i use a filter in schema.xml where i define a fieldtype and use some filter that understands xpath? On 4. Sep 2013, at 11:52 AM, Shalin Shekhar Mangar wrote: No that wouldn't work. It seems that you probably need a custom Transformer to extract the right div content. I do not know if TikaEntityProcessor supports such a thing. On Wed, Sep 4, 2013 at 12:38 PM, Andreas Owen a...@conx.ch wrote: so could i just nest it in a XPathEntityProcessor to filter the html or is there something like xpath for tika? entity name=htm processor=XPathEntityProcessor url=${rec.file} forEach=/div[@id='content'] dataSource=main entity name=tika processor=TikaEntityProcessor url=${htm} dataSource=dataUrl onError=skip htmlMapper=identity format=html field column=text / /entity /entity but now i dont know how to pass the text to tika, what do i put in url and datasource? On 3. Sep 2013, at 5:56 PM, Shalin Shekhar Mangar wrote: I don't know much about Tika but in the example data-config.xml that you posted, the xpath attribute on the field text won't work because the xpath attribute is used only by a XPathEntityProcessor. On Thu, Aug 29, 2013 at 10:20 PM, Andreas Owen a...@conx.ch wrote: I want tika to only index the content in div id=content.../div for the field text. unfortunately it's indexing the hole page. Can't xpath do this? data-config.xml: dataConfig dataSource type=BinFileDataSource name=data/ dataSource type=BinURLDataSource name=dataUrl/ dataSource type=URLDataSource name=main/ document entity name=rec processor=XPathEntityProcessor url=http://127.0.0.1/tkb/internet/docImportUrl.xml; forEach=/docs/doc dataSource=main !--transformer=script:GenerateId-- field column=title xpath=//title / field column=id xpath=//id / field column=file xpath=//file / field column=path xpath=//path / field column=url xpath=//url / field column=Author xpath=//author / entity name=tika processor=TikaEntityProcessor url=${rec.path}${rec.file} dataSource=dataUrl onError=skip htmlMapper=identity format=html field column=text xpath=//div[@id='content'] / /entity /entity /document /dataConfig -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: SolrCloud 4.x hangs under high update volume
I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96) at org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462) at org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1096) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:432) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175) at
Re: Boost by numFounds
I found that what can do the trick for page-rank like indexing is externalFileField! Is there an help to upload the external files to all solr servers (in solr 3 and solrCloud)? Or should I copy it to all solr instances data folder and then reload their cache? On Sat, Aug 24, 2013 at 12:36 AM, Flavio Pompermaier pomperma...@okkam.itwrote: Any help..? Is it possible to add this pagerank-like behaviour?
Re: SolrCloud 4.x hangs under high update volume
I am having this issue as well. I did apply this patch. Unfortunately, it did not resolve the issue in my case. On Wed, Sep 4, 2013 at 7:01 AM, Greg Walters gwalt...@sherpaanalytics.comwrote: Tim, Take a look at http://lucene.472066.n3.nabble.com/updating-docs-in-solr-cloud-hangs-td4067388.htmland https://issues.apache.org/jira/browse/SOLR-4816. I had the same issue that you're reporting for a while then I applied the patch from SOLR-4816 to my clients and the problems went away. If you don't feel like applying the patch it looks like it should be included in the release of version 4.5. Also note that the problem happens more frequently when the replication factor is greater than 1. Thanks, Greg -Original Message- From: Tim Vaillancourt [mailto:t...@elementspace.com] Sent: Tuesday, September 03, 2013 6:31 PM To: solr-user@lucene.apache.org Subject: SolrCloud 4.x hangs under high update volume Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96) at org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462) at org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at
Re: Solr Cloud hangs when replicating updates
Thanks. If there is anything I can do to help you resolve this issue, let me know. -Kevin On Wed, Sep 4, 2013 at 7:51 AM, Mark Miller markrmil...@gmail.com wrote: Ill look at fixing the root issue for 4.5. I've been putting it off for way to long. Mark Sent from my iPhone On Sep 3, 2013, at 2:15 PM, Kevin Osborn kevin.osb...@cbsi.com wrote: I was having problems updating SolrCloud with a large batch of records. The records are coming in bursts with lulls between updates. At first, I just tried large updates of 100,000 records at a time. Eventually, this caused Solr to hang. When hung, I can still query Solr. But I cannot do any deletes or other updates to the index. At first, my updates were going as SolrJ CSV posts. I have also tried local file updates and had similar results. I finally slowed things down to just use SolrJ's Update feature, which is basically just JavaBin. I am also sending over just 100 at a time in 10 threads. Again, it eventually hung. Sometimes, Solr hangs in the first couple of chunks. Other times, it hangs right away. These are my commit settings: autoCommit maxTime15000/maxTime maxDocs5000/maxDocs openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime3/maxTime /autoSoftCommit I have tried quite a few variations with the same results. I also tried various JVM settings with the same results. The only variable seems to be that reducing the cluster size from 2 to 1 is the only thing that helps. I also did a jstack trace. I did not see any explicit deadlocks, but I did see quite a few threads in WAITING or TIMED_WAITING. It is typically something like this: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00074039a450 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474) at org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395) at org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44) at org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364) at org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at
Re: Solr Cloud hangs when replicating updates
Ill look at fixing the root issue for 4.5. I've been putting it off for way to long. Mark Sent from my iPhone On Sep 3, 2013, at 2:15 PM, Kevin Osborn kevin.osb...@cbsi.com wrote: I was having problems updating SolrCloud with a large batch of records. The records are coming in bursts with lulls between updates. At first, I just tried large updates of 100,000 records at a time. Eventually, this caused Solr to hang. When hung, I can still query Solr. But I cannot do any deletes or other updates to the index. At first, my updates were going as SolrJ CSV posts. I have also tried local file updates and had similar results. I finally slowed things down to just use SolrJ's Update feature, which is basically just JavaBin. I am also sending over just 100 at a time in 10 threads. Again, it eventually hung. Sometimes, Solr hangs in the first couple of chunks. Other times, it hangs right away. These are my commit settings: autoCommit maxTime15000/maxTime maxDocs5000/maxDocs openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime3/maxTime /autoSoftCommit I have tried quite a few variations with the same results. I also tried various JVM settings with the same results. The only variable seems to be that reducing the cluster size from 2 to 1 is the only thing that helps. I also did a jstack trace. I did not see any explicit deadlocks, but I did see quite a few threads in WAITING or TIMED_WAITING. It is typically something like this: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00074039a450 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474) at org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395) at org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44) at org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364) at org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) It basically appears that Solr
Need help on Joining and sorting syntax and limitations between multiple documents in solr-4.4.0
Hi Team, In my project I am going to use Apache solr-4.4.0 version for searching. While doing that I need to join between multiple solr documents within the same core on one of the common field across the documents. Though I successfully join the documents using solr-4.4.0 join syntax, it is returning me the expected result, but, since my next requirement is to sort the returned result on basis of the fields from the documents Involved in join condition's from clause, which I was not able to get. Let me explain the problem in detail along with the files I am using ... 1) Files being used : a. Picklist_1.xml -- adddoc field name=describedObjectIdt1324838/field field name=describedObjectType7/field field name=picklistItemId956/field field name=siteId130712901/field field name=enDraft/field field name=grDraoft/field /doc/add b. Picklist_2.xml --- adddoc field name=describedObjectIdt1324837/field field name=describedObjectType7/field field name=picklistItemId87749/field field name=siteId130712901/field field name=enNew/field field name=grNeuo/field /doc/add c. AssetID_1.xml --- adddoc field name=def14227_picklistt1324837/field field name=describedObjectIda180894808/field field name=describedObjectType1/field field name=isMetadataCompletetrue/field field name=lastUpdateDate2013-09-02T09:28:18Z/field field name=ownerId130713716/field field name=siteId130712901/field /doc/add d. AssetID_2.xml adddoc field name=def14227_picklistt1324838/field field name=describedObjectIda171658357/field field name=describedObjectType1/field field name=ownerId130713716/field field name=rGroupId2283961/field field name=rGroupId2290309/field field name=rGroupPermissionLevel7/field field name=rGroupPermissionLevel7/field field name=rRuleId13503796/field field name=rRuleId15485964/field field name=rUgpId38052/field field name=rUgpId41133/field field name=siteId130712901/field /doc/add 2) Requirement: i. It needs to have a join between the files using def14227_picklist field from AssetID_1.xml and AssetID_2.xml and describedObjectId field from Picklist_1.xml and Picklist_2.xml files. ii. After joining we need to have all the fields from the files AssetID_*.xml and en,gr fields from Picklist_*.xml files. iii. While joining we also sort the result based on the en field value. 3) I was trying with q={!join from=inner_id to=outer_id}zzz:vvv syntax but no luck. Any help/suggestion would be appreciated. Thanks, Sukanta Dey
How to config SOLR server for spell check functionality
I want to implement spell check functionality offerd by solr using MySql database, but I dont understand how. Here the basic flow of what I want to do. I have a simple inputText (in jsf) and if I type the word shwo the response to OutputLabel should be show. First of all I'm using the following tools and frameworks: JBoss application server 6.1. Eclipse JPA JSF(Primefaces) Steps I've done until now: Step 1: Download solr server from: http://lucene.apache.org/solr/downloads.html Extract content. Step 2: Add to Envoierment variable: Variable name: solr.solr.home Variable value : D:\JBOSS\solr-4.4.0\solr-4.4.0\example\solr --- where you have the solr server Step 3: Open solr war and to solr.war\WEB-INF\web.xml add env-entry - (the easy way) solr/home D:\JBOSS\solr-4.4.0\solr-4.4.0\example\solr java.lang.String OR import project change and bulid war. Step 4: Browser: localhost:8080/solr/ And the solr console appears. Until now all works well. I have found some usefull code (my opinion) that returns: [collection1] webapp=/solr path=/spell params={spellcheck=onq=whateverwt=javabinqt=/spellversion=2spellcheck.build=true} hits=0 status=0 QTime=16 Here is the code that gives the result from above: SolrServer solr; try { solr = new CommonsHttpSolrServer(http://localhost:8080/solr;); ModifiableSolrParams params = new ModifiableSolrParams(); params.set(qt, /spell); params.set(q, whatever); params.set(spellcheck, on); params.set(spellcheck.build, true); QueryResponse response = solr.query(params); SpellCheckResponse spellCheckResponse = response.getSpellCheckResponse(); if (!spellCheckResponse.isCorrectlySpelled()) { for (Suggestion suggestion : response.getSpellCheckResponse().getSuggestions()) { System.out.println(original token: + suggestion.getToken() + - alternatives: + suggestion.getAlternatives()); } } } catch (Exception e) { // TODO Auto-generated catch block e.printStackTrace(); } Questions: 1.How do I make the database connection whit my DB and search the content to see if there are any words that could match? 2.How do I make the configuration.(solr-config.xml,shema.xml...etc)? 3.How do I send a string from my view(xhtml) so that the solr server knows what he looks for? I read all the information about solr but it's still unclear: Links:Main Page: http://lucene.apache.org/solr/ Main Page tutorial: http://lucene.apache.org/solr/4_4_0/tutorial.html Solr Wiki: http://wiki.apache.org/solr/Solrj --- official solrj documentation http://wiki.apache.org/solr/SpellCheckComponent Solr config: http://wiki.apache.org/solr/SolrConfigXml http://www.installationpage.com/solr/solr-configuration-tutorial-schema-solrconfig-xml/ http://wiki.apache.org/solr/SchemaXml StackOverflow proof: Solr Did you mean (Spell check component) Solr Database Integration: http://www.slideshare.net/th0masr/integrating-the-solr-search-engine http://www.cabotsolutions.com/2009/05/using-solr-lucene-for-full-text-search-with-mysql-db/ Solr Spell Check: http://docs.lucidworks.com/display/solr/Spell+Checking http://searchhub.org/2010/08/31/getting-started-spell-checking-with-apache-lucene-and-solr/ http://techiesinsight.blogspot.ro/2012/06/using-solr-spellchecker-from-java.html http://blog.websolr.com/post/2748574298/spellcheck-with-solr-spellcheckcomponent How to use SpellingResult class in SolrJ I really need your help.Regards. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-config-SOLR-server-for-spell-check-functionality-tp4088163.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr performance against oracle
On Wed, 2013-09-04 at 14:06 +0200, Sergio Stateri wrote: I´m trying to change the data access in the company where I work from Oracle to Solr. They work on different principles and fulfill different needs. Comparing them by a performance oriented test are not likely to be usable point for selecting between them. Start by describing your typical use cases instead. Solr aways returns the data arround 150~200 ms (from localhost), but Oracle returns arround 20 ms (and Oracle server is in another company, I´m using dedicated link to access it). 200ms is suspiciously slow for a trivial lookup in 800,000 values. I am sure we can bring that down to Oracle-time or better, but I do not think it shows much. How can I tell to my managers that I´d like to use Solr? Why would you like to use Solr?
Solr highlighting fragment issue
Hi, I'm having some issues with Solr search results (using Solr 1.4 ) . I have enabled highlighting of searched text (hl=true) and set the fragment size as 500 (hl.fragsize=500) in the search query. Below is the (screen shot) results shown when I searched for the term 'grandfather' (2 results are displayed) . Now I have couple of problems in this. 1. In the search results the keyword is appearing inconsistently towards the start/end of the text. I'd like to control the number of characters appearing before and after the keyword match (highlighted term). More specifically I'd like to get the keyword match somewhere around the middle of the resultant text. 2. The total number of characters appearing in the search result is never equals the fragment size I specified (500 characters). It varies in greater extends (for example 408 or 520). Please share your thoughts on achieving the above 2 results. [cid:image001.png@01CEA8D2.4FF025E0] Thanks Regards, Sreehareesh KM
Re: SolrCloud 4.x hangs under high update volume
There is an issue if I remember right, but I can't find it right now. If anyone that has the problem could try this patch, that would be very helpful: http://pastebin.com/raw.php?i=aaRWwSGP - Mark On Wed, Sep 4, 2013 at 8:04 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96) at org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462) at org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at
Re: Solr Cloud hangs when replicating updates
It would be great if you could give this patch a try: http://pastebin.com/raw.php?i=aaRWwSGP - Mark On Wed, Sep 4, 2013 at 8:31 AM, Kevin Osborn kevin.osb...@cbsi.com wrote: Thanks. If there is anything I can do to help you resolve this issue, let me know. -Kevin On Wed, Sep 4, 2013 at 7:51 AM, Mark Miller markrmil...@gmail.com wrote: Ill look at fixing the root issue for 4.5. I've been putting it off for way to long. Mark Sent from my iPhone On Sep 3, 2013, at 2:15 PM, Kevin Osborn kevin.osb...@cbsi.com wrote: I was having problems updating SolrCloud with a large batch of records. The records are coming in bursts with lulls between updates. At first, I just tried large updates of 100,000 records at a time. Eventually, this caused Solr to hang. When hung, I can still query Solr. But I cannot do any deletes or other updates to the index. At first, my updates were going as SolrJ CSV posts. I have also tried local file updates and had similar results. I finally slowed things down to just use SolrJ's Update feature, which is basically just JavaBin. I am also sending over just 100 at a time in 10 threads. Again, it eventually hung. Sometimes, Solr hangs in the first couple of chunks. Other times, it hangs right away. These are my commit settings: autoCommit maxTime15000/maxTime maxDocs5000/maxDocs openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime3/maxTime /autoSoftCommit I have tried quite a few variations with the same results. I also tried various JVM settings with the same results. The only variable seems to be that reducing the cluster size from 2 to 1 is the only thing that helps. I also did a jstack trace. I did not see any explicit deadlocks, but I did see quite a few threads in WAITING or TIMED_WAITING. It is typically something like this: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x00074039a450 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:139) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:474) at org.apache.solr.handler.loader.CSVLoaderBase.doAdd(CSVLoaderBase.java:395) at org.apache.solr.handler.loader.SingleThreadedCSVLoader.addDoc(CSVLoader.java:44) at org.apache.solr.handler.loader.CSVLoaderBase.load(CSVLoaderBase.java:364) at org.apache.solr.handler.loader.CSVLoader.load(CSVLoader.java:31) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:533) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at
Questions about Replication Factor on solrcloud
Hi all, Im currently working on deploying a solrcloud distribution in centos machines and wanted to have more guidance about Replication Factor configuration. I have configured two servers with solrcloud over tomcat and a third server as zookeeper. I have configured successfully and have one server with collection1 available and the other with collection1_Shard1_Replica1. My questions are: - Can I have 1 shard and 2 replicas on two machines? What are the limitations or considerations to define this? - How does replica works? (there is not too much info about it) - When I import data on collection1 it works properly, but when I do it in collection1_Shard1_Replica1 it fails. Is that an expected behavior? (Maybe if I have a better definition of replicas I will understand it better) Thanks in advance for your help and guidance. Regards, Lisandro Montano
Re: SolrCloud 4.x hangs under high update volume
Thanks guys! :) Mark: this patch is much appreciated, I will try to test this shortly, hopefully today. For my curiosity/understanding, could someone explain to me quickly what locks SolrCloud takes on updates? Was I on to something that more shards decrease the chance for locking? Secondly, I was wondering if someone could summarize what this patch 'fixes'? I'm not too familiar with Java and the solr codebase (working on that though :D). Cheers, Tim On 4 September 2013 09:52, Mark Miller markrmil...@gmail.com wrote: There is an issue if I remember right, but I can't find it right now. If anyone that has the problem could try this patch, that would be very helpful: http://pastebin.com/raw.php?i=aaRWwSGP - Mark On Wed, Sep 4, 2013 at 8:04 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96) at org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462) at
Re: cleanup after OutOfMemoryError
I don't know that there is any 'safe' thing you can do other than restart - but if I were to try anything, I would use true for rollback. - Mark On Wed, Sep 4, 2013 at 9:44 AM, Ryan McKinley ryan...@gmail.com wrote: I have an application where I am calling DirectUpdateHandler2 directly with: update.addDoc(cmd); This will sometimes hit: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:248) at org.apache.lucene.store.DataOutput.writeString(DataOutput.java:234) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.writeField(CompressingStoredFieldsWriter.java:273) at org.apache.lucene.index.StoredFieldsProcessor.finishDocument(StoredFieldsProcessor.java:126) at org.apache.lucene.index.TwoStoredFieldsConsumers.finishDocument(TwoStoredFieldsConsumers.java:65) at org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:264) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:283) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:212) at voyager.index.zmq.IndexingRunner.apply(IndexingRunner.java:303) and then a little while later: auto commit error...:java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:549) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) Is there anythign I can/should do to cleanup after the OOME? At a minimum I do not want any new requests using the same IndexWriter. Should I use: catch(OutOfMemoryError ex) { update.getCommitTracker().cancelPendingCommit(); update.newIndexWriter(false); ... or perhaps 'true' for rollback? Thanks Ryan -- - Mark
Re: SolrCloud 4.x hangs under high update volume
The 'lock' or semaphore was added to cap the number of threads that would be used. Previously, the number of threads in use could spike to many, many thousands on heavy updates. A limit on the number of outstanding requests was put in place to keep this from happening. Something like 16 * the number of hosts in the cluster. I assume the deadlock comes from the fact that requests are of two kinds - forward to the leader and distrib updates from the leader to replicas. Forward to the leader actually waits for the leader to then distrib the updates to replicas before returning. I believe this is what can lead to deadlock. This is likely why the patch for the CloudSolrServer can help the situation - it removes the need to forward to the leader because it sends to the correct leader to begin with. Only useful if you are adding docs with CloudSolrServer though, and more like a workaround than a fix. The patch uses a separate 'limiting' semaphore for the two cases. - Mark On Sep 4, 2013, at 10:22 AM, Tim Vaillancourt t...@elementspace.com wrote: Thanks guys! :) Mark: this patch is much appreciated, I will try to test this shortly, hopefully today. For my curiosity/understanding, could someone explain to me quickly what locks SolrCloud takes on updates? Was I on to something that more shards decrease the chance for locking? Secondly, I was wondering if someone could summarize what this patch 'fixes'? I'm not too familiar with Java and the solr codebase (working on that though :D). Cheers, Tim On 4 September 2013 09:52, Mark Miller markrmil...@gmail.com wrote: There is an issue if I remember right, but I can't find it right now. If anyone that has the problem could try this patch, that would be very helpful: http://pastebin.com/raw.php?i=aaRWwSGP - Mark On Wed, Sep 4, 2013 at 8:04 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at
cleanup after OutOfMemoryError
I have an application where I am calling DirectUpdateHandler2 directly with: update.addDoc(cmd); This will sometimes hit: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:248) at org.apache.lucene.store.DataOutput.writeString(DataOutput.java:234) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.writeField(CompressingStoredFieldsWriter.java:273) at org.apache.lucene.index.StoredFieldsProcessor.finishDocument(StoredFieldsProcessor.java:126) at org.apache.lucene.index.TwoStoredFieldsConsumers.finishDocument(TwoStoredFieldsConsumers.java:65) at org.apache.lucene.index.DocFieldProcessor.finishDocument(DocFieldProcessor.java:264) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:283) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:432) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1513) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:212) at voyager.index.zmq.IndexingRunner.apply(IndexingRunner.java:303) and then a little while later: auto commit error...:java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2726) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2897) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2872) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:549) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) Is there anythign I can/should do to cleanup after the OOME? At a minimum I do not want any new requests using the same IndexWriter. Should I use: catch(OutOfMemoryError ex) { update.getCommitTracker().cancelPendingCommit(); update.newIndexWriter(false); ... or perhaps 'true' for rollback? Thanks Ryan
subindex
Hi Is there a way to build a new (smaller) index from an existing (larger) index where the smaller index contains a subset of the fields of the larger index? thank you
RE: SolrCloud 4.x hangs under high update volume
Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96) at org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462) at org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1178) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564) at
Re: Numeric fields and payload
Chris Hostetter hossman_lucene at fucit.org writes: : is it possible to store (text) payload to numeric fields (class : solr.TrieDoubleField)? My goal is to store measure units to numeric : features - e.g. '1.5 cm' - and to use faceted search with these fields. : But the field type doesn't allow analyzers to add the payload data. I : want to avoid database access to load the units. I'm using Solr 4.2 . I'm not sure if it's possible to add payloads to Trie fields, but even if there is i don't think you really want that for your usecase -- i think it would make a lot more sense to normalize your units so you do consistent sorting, range queries, and faceting on the values regardless of wether it's 100cm or 1000mm or 1m. -Hoss Hoss, What you suggest may be fine for specific units. But for monetary values with formatting it is not realistic. $10,000.00 would require formatting the number to display it. It would be much easier to store the string as a payload with the formatted value. Peter Lenahan
Little XsltResponseWriter documentation bug (Attn: Wiki Admin)
Hi, http://wiki.apache.org/solr/XsltResponseWriter (and reference manual PDF too) become out of date: In configuration section queryResponseWriter name=xslt class=org.apache.solr.request.XSLTResponseWriter int name=xsltCacheLifetimeSeconds5/int /queryResponseWriter class name org.apache.solr.request.XSLTResponseWriter should be replaced by org.apache.solr.response.XSLTResponseWriter Otherwise ClassNotFoundException happens. Change is result of https://issues.apache.org/jira/browse/SOLR-1602 as far as I see. Apparently can't update that page myself, please could someone else do that? Thanks!
RE: Solr highlighting fragment issue
I’m having some issues with Solr search results (using Solr 1.4 ) . I have enabled highlighting of searched text (hl=true) and set the fragment size as 500 (hl.fragsize=500) in the search query. Below is the (screen shot) results shown when I searched for the term ‘grandfather’ (2 results are displayed) . Now I have couple of problems in this. 1. In the search results the keyword is appearing inconsistently towards the start/end of the text. I’d like to control the number of characters appearing before and after the keyword match (highlighted term). More specifically I’d like to get the keyword match somewhere around the middle of the resultant text. 2. The total number of characters appearing in the search result is never equals the fragment size I specified (500 characters). It varies in greater extends (for example 408 or 520). Please share your thoughts on achieving the above 2 results. I can’t see your screenshot, but it doesn’t really matter. If I remember correctly how this stuff works, I think you’re going to have a challenge getting where you want to get. In your position, I would push back on both of those requirements rather than try to solve the problem. For (1), the issue is that, IIRC, the highlighter breaks up your documents into fragments BEFORE it knows where the matches are. I’d think you’d have to pretty seriously recast the algorithm to get the result you want. For (2), it may well be that you could tune the fragmenter to get closer to your desired number of characters, either writing your own, or using the available regexes and whatnot. But getting an exact number of characters does not seem reasonable, because I’m pretty sure that there is a constraint that a matching term must appear in its entirety in one fragment – and also that sometimes fragments are concatenated. Imagine, for example, a matched phrase where the start of the phrase is in one fragment, and the end is in another. Which goes back to the first point. So if you absolutely must have both of these (and the second one is strange, since it implies that your fragments will often start and end in the middles of words), then I guess you would need to rewrite the fragmenting algorithm to drive fragmenting from the matches. -- Bryan
Re: Little XsltResponseWriter documentation bug (Attn: Wiki Admin)
It's a wiki. Can't you correct it? Upayavira On Wed, Sep 4, 2013, at 08:25 PM, Dmitri Popov wrote: Hi, http://wiki.apache.org/solr/XsltResponseWriter (and reference manual PDF too) become out of date: In configuration section queryResponseWriter name=xslt class=org.apache.solr.request.XSLTResponseWriter int name=xsltCacheLifetimeSeconds5/int /queryResponseWriter class name org.apache.solr.request.XSLTResponseWriter should be replaced by org.apache.solr.response.XSLTResponseWriter Otherwise ClassNotFoundException happens. Change is result of https://issues.apache.org/jira/browse/SOLR-1602 as far as I see. Apparently can't update that page myself, please could someone else do that? Thanks!
Re: Little XsltResponseWriter documentation bug (Attn: Wiki Admin)
Upayavira, I could edit that page myself, but need to be confirmed human according to http://wiki.apache.org/solr/FrontPage#How_to_edit_this_Wiki My wiki account name is 'pin' just in case. On Wed, Sep 4, 2013 at 5:27 PM, Upayavira u...@odoko.co.uk wrote: It's a wiki. Can't you correct it? Upayavira On Wed, Sep 4, 2013, at 08:25 PM, Dmitri Popov wrote: Hi, http://wiki.apache.org/solr/XsltResponseWriter (and reference manual PDF too) become out of date: In configuration section queryResponseWriter name=xslt class=org.apache.solr.request.XSLTResponseWriter int name=xsltCacheLifetimeSeconds5/int /queryResponseWriter class name org.apache.solr.request.XSLTResponseWriter should be replaced by org.apache.solr.response.XSLTResponseWriter Otherwise ClassNotFoundException happens. Change is result of https://issues.apache.org/jira/browse/SOLR-1602 as far as I see. Apparently can't update that page myself, please could someone else do that? Thanks!
Re: SolrCloud 4.x hangs under high update volume
Thanks so much for the explanation Mark, I owe you one (many)! We have this on our high TPS cluster and will run it through it's paces tomorrow. I'll provide any feedback I can, more soon! :D Cheers, Tim
Invalid Version when slave node pull replication from master node
HI solrusers I'm testing the replication within SolrCloud . I just uncomment the replication section separately on the master and slave node. The replication section setting on the master node: lst name=master str name=replicateAftercommit/str str name=replicateAfterstartup/str str name=confFilesschema.xml,stopwords.txt/str /lst and on the slave node: lst name=slave str name=masterUrlhttp://10.7.23.124:8080/solr/#//str str name=pollInterval00:00:50/str /lst After startup, an Error comes out on the slave node : 80110110 [snapPuller-70-thread-1] ERROR org.apache.solr.handler.SnapPuller ?.Master at: http://10.7.23.124:8080/solr/#/ is not available. Index fetch failed. Exception: Invalid version (expected 2, but 60) or the data in not in 'javabin' format Could anyone help me to solve the problem ? regards
Re: Invalid Version when slave node pull replication from master node
Hi again I'm using Solr4.4. 2013/9/5 YouPeng Yang yypvsxf19870...@gmail.com HI solrusers I'm testing the replication within SolrCloud . I just uncomment the replication section separately on the master and slave node. The replication section setting on the master node: lst name=master str name=replicateAftercommit/str str name=replicateAfterstartup/str str name=confFilesschema.xml,stopwords.txt/str /lst and on the slave node: lst name=slave str name=masterUrlhttp://10.7.23.124:8080/solr/#//str str name=pollInterval00:00:50/str /lst After startup, an Error comes out on the slave node : 80110110 [snapPuller-70-thread-1] ERROR org.apache.solr.handler.SnapPuller ?.Master at: http://10.7.23.124:8080/solr/#/ is not available. Index fetch failed. Exception: Invalid version (expected 2, but 60) or the data in not in 'javabin' format Could anyone help me to solve the problem ? regards
Re: Invalid Version when slave node pull replication from master node
Hi all I solve the problem by add the coreName explicitly according to http://wiki.apache.org/solr/SolrReplication#Replicating_solrconfig.xml. But I want to make sure about that is it necessary to set the coreName explicitly. Is there any SolrJ API to pull the replication on the slave node from the master node? regards 2013/9/5 YouPeng Yang yypvsxf19870...@gmail.com Hi again I'm using Solr4.4. 2013/9/5 YouPeng Yang yypvsxf19870...@gmail.com HI solrusers I'm testing the replication within SolrCloud . I just uncomment the replication section separately on the master and slave node. The replication section setting on the master node: lst name=master str name=replicateAftercommit/str str name=replicateAfterstartup/str str name=confFilesschema.xml,stopwords.txt/str /lst and on the slave node: lst name=slave str name=masterUrlhttp://10.7.23.124:8080/solr/#//str str name=pollInterval00:00:50/str /lst After startup, an Error comes out on the slave node : 80110110 [snapPuller-70-thread-1] ERROR org.apache.solr.handler.SnapPuller ?.Master at: http://10.7.23.124:8080/solr/#/ is not available. Index fetch failed. Exception: Invalid version (expected 2, but 60) or the data in not in 'javabin' format Could anyone help me to solve the problem ? regards
Re: unknown _stream_source_info while indexing rich doc in solr
yes sir i did restart the tomcat. On Wed, Sep 4, 2013 at 6:27 PM, Jack Krupansky-2 [via Lucene] ml-node+s472066n4088181...@n3.nabble.com wrote: Did you restart Solr after editing config and schema? -- Jack Krupansky -Original Message- From: Nutan Sent: Wednesday, September 04, 2013 3:07 AM To: [hidden email] http://user/SendEmail.jtp?type=nodenode=4088181i=0 Subject: unknown _stream_source_info while indexing rich doc in solr i am using solr4.2 on windows7 my schema is: field name=id type=string indexed=true stored=true required=true/ field name=author type=string indexed=true stored=true multiValued=true/ field name=comments type=text indexed=true stored=true multiValued=false/ field name=keywords type=text indexed=true stored=true multiValued=false/ field name=contents type=text indexed=true stored=true multiValued=false/ field name=title type=text indexed=true stored=true multiValued=false/ field name=revision_number type=string indexed=true stored=true multiValued=false/ dynamicField name=ignored_* type=ignored indexed=false stored= falsemultiValued=true/ solrconfig.xml : requestHandler name=/update/extract class=org.apache.solr.handler. extraction.ExtractingRequestHandler lst name=defaults str name=fmap.contentcontents/str str name=lowernamestrue/str str name=uprefixignored_/str str name=captureAttrtrue/str /lst /requestHandler when i execute: curl http://localhost:8080/solr/update/extract?literal.id=1commit=true; -F myfile=@abc.txt i get error:unknown field ignored_stream_ source_info. i referred solr cookbook3.1 and solrcookbook4 but error is not resolved please help me. -- View this message in context: http://lucene.472066.n3.nabble.com/unknown-stream-source-info-while-indexing-rich-doc-in-solr-tp4088136.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/unknown-stream-source-info-while-indexing-rich-doc-in-solr-tp4088136p4088181.html To unsubscribe from unknown _stream_source_info while indexing rich doc in solr, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4088136code=bnV0YW5zaGluZGUxOTkyQGdtYWlsLmNvbXw0MDg4MTM2fC0xMzEzOTU5Mzcx . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/unknown-stream-source-info-while-indexing-rich-doc-in-solr-tp4088136p4088295.html Sent from the Solr - User mailing list archive at Nabble.com.