Re: Solr hanging when extracting a some broken .doc files
On 17/12/2013 15:29, Augusto Camarotti wrote: Hi guys, I'm having a problem with solr when trying to index some broken .doc files. I have set up a test case using Solr to index all the files the users save on the shared directorys of the company that i work for and Solr is hanging when trying to index this file in particular(the one i'm attaching on this e-mail). There are some others broken .doc files that Solr index by the name without a problem, even logging some Tika erros during the process, but when it reaches this file in particular, it hangs and i have to cancel the upload. I cannot guarantee the directorys will never hold a broken .doc file, or a broken file with some other extension, so i guess solr could just return a failing message, or something like that. These are the logging messages solr is recording: 03:38:23ERROR SolrCoreorg.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@386f9474 03:38:25ERROR SolrDispatchFilter null:org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@386f9474 So, how do I prevent solr from hanging when trying to index broken files? Regards, Augusto Camarotti We don't like to run Tika from within Solr ourselves, as it has been known to barf (especially on large PDF files, yes there are such horrors as 3000 page PDFs!). We usually run it in an external process so it can be watched and killed if necessary. Cheers Charlie -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
Re: PostingsSolrHighlighter
hi Josip for the 1 question we've done similar things: copying search field to a text field. But highlighting is normally on specific fields such as tittle depending on how the search content is displayed to the front end, you can search on text and highlight on the field you wanted by specify hl.fl ref: http://wiki.apache.org/solr/HighlightingParameters#hl.fl On 17 December 2013 02:29, Josip Delic j...@lugensa.com wrote: Hi @all, i am playing with the PostingsSolrHighlighter. I'm running solr 4.6.0 and my configuration is from here: https://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/highlight/ PostingsSolrHighlighter.html Search query and result (not working): http://pastebin.com/13Uan0ZF Schema (not complete): http://pastebin.com/JGa38UDT Search query and result (working): http://pastebin.com/4CP8XKnr Solr config: searchComponent class=solr.HighlightComponent name=highlight highlighting class=org.apache.solr.highlight.PostingsSolrHighlighter/ /searchComponent So this is working just fine, but now i have some questions: 1.) With the old default highlighter component it was possible to search in searchable_text and to retrive highlighted text. This is essential, because we use copyfield to put almost everything to searchable_text (title, subtitle, description, ...) 2.) I can't get ellipsis working i tried hl.tag.ellipsis=..., f.text.hl.tag.ellipsis=..., configuring it in RequestHandler noting seems to work, maxAnalyzedChars is just cutting the sentence? Kind Regards Josip Delic -- All the best Liu Bo
Re: Solr hanging when extracting a some broken .doc files
Charlie, Does it mean you are talking to it from a client program? Or are you running Tika in a listen/server mode and build some adapters for standard Solr processes? Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Dec 18, 2013 at 3:47 PM, Charlie Hull char...@flax.co.uk wrote: On 17/12/2013 15:29, Augusto Camarotti wrote: Hi guys, I'm having a problem with solr when trying to index some broken .doc files. I have set up a test case using Solr to index all the files the users save on the shared directorys of the company that i work for and Solr is hanging when trying to index this file in particular(the one i'm attaching on this e-mail). There are some others broken .doc files that Solr index by the name without a problem, even logging some Tika erros during the process, but when it reaches this file in particular, it hangs and i have to cancel the upload. I cannot guarantee the directorys will never hold a broken .doc file, or a broken file with some other extension, so i guess solr could just return a failing message, or something like that. These are the logging messages solr is recording: 03:38:23ERROR SolrCoreorg.apache.solr.common. SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@386f9474 03:38:25ERROR SolrDispatchFilter null:org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@386f9474 So, how do I prevent solr from hanging when trying to index broken files? Regards, Augusto Camarotti We don't like to run Tika from within Solr ourselves, as it has been known to barf (especially on large PDF files, yes there are such horrors as 3000 page PDFs!). We usually run it in an external process so it can be watched and killed if necessary. Cheers Charlie -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
Re: an array liked string is treated as multivalued when adding doc to solr
Hi Alexandre It's quite a rare case, just one out of tens of thousands. I'm planning to have every multilingual field as multivalued and just get the first one while formatting the response to our business object. The first value update processor seems a lot helpful, thank you. All the best Liu Bo On 18 December 2013 15:26, Alexandre Rafalovitch arafa...@gmail.com wrote: If this happens rarely and you want to deal with in on the way into Solr, you could just keep one of the values, using URP: http://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/update/processor/FirstFieldValueUpdateProcessorFactory.html Regards, Alex Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Dec 18, 2013 at 2:20 PM, Liu Bo diabl...@gmail.com wrote: Hey Furkan and solr users This is a miss reported problem. It's not solr problem but our data issue. Sorry for this. It's a data issue of our side, a coupon happened to have two piece English description, which is not allowed in our business logic, but it happened and we added twice of the name_en_US to solr document. I've done a set of test and deep debugging to solr source code, and found out that a array like string such as [Get 20% Off Official Barca Kits, coupon] won't be treated as multivalued field. Sorry again for not digging more before sent out question email. I trust our business logic and data integrity more than solr, I will definitely not do this again. ;-) All the best Liu Bo On 11 December 2013 07:21, Furkan KAMACI furkankam...@gmail.com wrote: Hi Liu; Yes. it is an expected behavior. If you send data within square brackets Solr will behave it as a multivalued field. You can test it with this way: if you use Solrj and use a List for a field it will be considered as multivalued too because when you call toString() method of your List you can see that elements are printed within square brackets. This is the reason that a List can be used for a multivalued field. If you explain your situation I can offer a way how to do it. Thanks; Furkan KAMACI 2013/12/6 Liu Bo diabl...@gmail.com Dear solr users: I've met this kind of error several times, when add a array liked string such as:[Get 20% Off Official Barça Kits, coupon] to a multiValued=false field, solr will complain: org.apache.solr.common.SolrException: ERROR: [doc=7781396456243918692] multiple values encountered for non multiValued field name_en_US: [Get 20% Off Official Barca Kits, coupon] my schema defination: field name=name_en_US type=text_en indexed=true stored=true multiValued=false / This field is stored as the search result needs this field and it's value in original format, and indexed to give it a boost while searching . What I do is adding name (java.lang.String) to SolrInputDocument by addField(name_en_US, product.getName()) method, and then add this to solr using an AddUpdateCommand It seems solr treats this kind of string data as multivalued, even I add this field to solr only once. Is this a bug or a supposed behavior? Is there any way to tell solr this is not a multivalued value add don't break it? Your help and suggestion will be much of my appreciation. -- All the best Liu Bo -- All the best Liu Bo -- All the best Liu Bo
DataImport Handler, writing a new EntityProcessor
Hi all! I've got a question regarding writing a new EntityProcessor, in the same sense as the Tika one. My EntityProcessor should analyze jpg images and create document fields to be used with the LIRE Solr plugin (https://bitbucket.org/dermotte/liresolr). Basically I've taken the same approach as the TikaEntityProcessor, but my setup just indexes the first of 1000 images. I'm using a FileListEntityProcessor to get all JPEGs from a directory and then I'm handing them over (see [2]). My code for the EntityProcessor is at [1]. I've tried to use the DataSource as well as the filePath attribute, but it ends up all the same. However, the FileListEntityProcessor is able to read all the files according to the debug output, but I'm missing the link from the FileListEntityProcessor to the LireEntityProcessor. I'd appreciate any pointer or help :) cheers, Mathias [1] LireEntityProcessor http://pastebin.com/JFajkNtf [2] dataConfig http://pastebin.com/vSHucatJ -- Dr. Mathias Lux Klagenfurt University, Austria http://tinyurl.com/mlux-itec
PeerSync Recovery fails, starting Replication Recovery
Hi, In our SolrCloud cluster (2 shards, 8 replicas), the replicas go from time to time into recovering state, and it takes more than 10 minutes to finish to recover. In logs, we see that PeerSync Recovery fails with the message : PeerSync: core=fr_green url="" class="moz-txt-link-freetext" href="http://solr-08/searchsolrnodefr">http://solr-08/searchsolrnodefr too many updates received since start - startingUpdates no longer overlaps with our currentUpdates Then Replication Recovery starts. Is there something we can do to avoid the failure of Peer Recovery so that the recovery process is more rapid (less than 10 minutes) ? The full trace log is here : 2013-12-05 13:51:53,740 [http-8080-46] INFO org.apache.solr.handler.admin.CoreAdminHandler:handleRequestRecoveryAction:705 - It has been requested that we recover 2013-12-05 13:51:53,740 [http-8080-112] INFO org.apache.solr.handler.admin.CoreAdminHandler:handleRequestRecoveryAction:705 - It has been requested that we recover 2013-12-05 13:51:53,740 [http-8080-112] INFO org.apache.solr.servlet.SolrDispatchFilter:handleAdminRequest:658 - [admin] webapp=null path=/admin/cores params={action="" status=0 QTime=0 2013-12-05 13:51:53,740 [Thread-1544] INFO org.apache.solr.cloud.ZkController:publish:1017 - publishing core=fr_green state=recovering 2013-12-05 13:51:53,741 [http-8080-46] INFO org.apache.solr.servlet.SolrDispatchFilter:handleAdminRequest:658 - [admin] webapp=null path=/admin/cores params={action="" status=0 QTime=1 2013-12-05 13:51:53,740 [Thread-1543] INFO org.apache.solr.cloud.ZkController:publish:1017 - publishing core=fr_green state=recovering 2013-12-05 13:51:53,743 [Thread-1544] INFO org.apache.solr.cloud.ZkController:publish:1021 - numShards not found on descriptor - reading it from system property 2013-12-05 13:51:53,746 [Thread-1543] INFO org.apache.solr.cloud.ZkController:publish:1021 - numShards not found on descriptor - reading it from system property 2013-12-05 13:51:53,755 [Thread-1543] WARN org.apache.solr.cloud.RecoveryStrategy:close:105 - Stopping recovery for zkNodeName=solr-08_searchsolrnodefr_fr_greencore=fr_green 2013-12-05 13:51:53,756 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:run:216 - Starting recovery process. core=fr_green recoveringAfterStartup=false 2013-12-05 13:51:53,762 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:495 - Finished recovery process. core=fr_green 2013-12-05 13:51:53,762 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:run:216 - Starting recovery process. core=fr_green recoveringAfterStartup=false 2013-12-05 13:51:53,765 [RecoveryThread] INFO org.apache.solr.cloud.ZkController:publish:1017 - publishing core=fr_green state=recovering 2013-12-05 13:51:53,765 [RecoveryThread] INFO org.apache.solr.cloud.ZkController:publish:1021 - numShards not found on descriptor - reading it from system property 2013-12-05 13:51:53,767 [RecoveryThread] INFO org.apache.solr.client.solrj.impl.HttpClientUtil:createClient:103 - Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false 2013-12-05 13:51:54,777 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader:process:210 - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 18) 2013-12-05 13:51:56,804 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:356 - Attempting to PeerSync from http://solr-02/searchsolrnodefr/fr_green/ core=fr_green - recoveringAfterStartup=false 2013-12-05 13:51:56,806 [RecoveryThread] WARN org.apache.solr.update.PeerSync:sync:232 - PeerSync: core=fr_green url="" class="moz-txt-link-freetext" href="http://solr-08/searchsolrnodefr">http://solr-08/searchsolrnodefr too many updates received since start - startingUpdates no longer overlaps with our currentUpdates 2013-12-05 13:51:56,806 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:394 - PeerSync Recovery was not successful - trying replication. core=fr_green 2013-12-05 13:51:56,806 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:397 - Starting Replication Recovery. core=fr_green 2013-12-05 13:51:56,806 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:399 - Begin buffering updates. core=fr_green 2013-12-05 13:51:56,806 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:replicate:127 - Attempting to replicate from http://solr-02/searchsolrnodefr/fr_green/. core=fr_green 2013-12-05 13:51:56,806 [RecoveryThread] INFO org.apache.solr.client.solrj.impl.HttpClientUtil:createClient:103 - Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false 2013-12-05 13:52:01,203 [RecoveryThread] INFO org.apache.solr.handler.SnapPuller:init:211 - No value set for 'pollInterval'. Timer Task not started. 2013-12-05 13:52:01,209 [RecoveryThread] INFO
Re: PostingsSolrHighlighter
Am 18.12.2013 09:55, schrieb Liu Bo: hi Josip hi liu, for the 1 question we've done similar things: copying search field to a text field. But highlighting is normally on specific fields such as tittle depending on how the search content is displayed to the front end, you can search on text and highlight on the field you wanted by specify hl.fl ref: http://wiki.apache.org/solr/HighlightingParameters#hl.fl thats exactly what i'm doing in that pastebin: http://pastebin.com/13Uan0ZF I'm searing there for 'q=searchable_text:labore' this is present in 'text' and in the copyfield 'searchable_text' but it is not highlighted in 'text' (hl.fl=text) The same query is working if set 'q=text:labore' as you can see in http://pastebin.com/4CP8XKnr For 2 question i figured out that the PostingsSolrHighlighter ellipsis is not like i thought for adding ellipsis to start or/and end in highlighted text. It is instead used to combine multiple snippets together if snippets is 1. cheers josip On 17 December 2013 02:29, Josip Delic j...@lugensa.com wrote: Hi @all, i am playing with the PostingsSolrHighlighter. I'm running solr 4.6.0 and my configuration is from here: https://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/highlight/ PostingsSolrHighlighter.html Search query and result (not working): http://pastebin.com/13Uan0ZF Schema (not complete): http://pastebin.com/JGa38UDT Search query and result (working): http://pastebin.com/4CP8XKnr Solr config: searchComponent class=solr.HighlightComponent name=highlight highlighting class=org.apache.solr.highlight.PostingsSolrHighlighter/ /searchComponent So this is working just fine, but now i have some questions: 1.) With the old default highlighter component it was possible to search in searchable_text and to retrive highlighted text. This is essential, because we use copyfield to put almost everything to searchable_text (title, subtitle, description, ...) 2.) I can't get ellipsis working i tried hl.tag.ellipsis=..., f.text.hl.tag.ellipsis=..., configuring it in RequestHandler noting seems to work, maxAnalyzedChars is just cutting the sentence? Kind Regards Josip Delic smime.p7s Description: S/MIME Cryptographic Signature
Wildcard queries and custom char filter
Hello, I have a problem with configuring custom char filter. When there are no wildcards in query then my filter is invoked. When there are wildcards, my filter is not invoked. It is possible to configure charFilter to be used with wildcard queries? I can see than with wildcards, TokenizerChain.charFilters is null. configuration: analyzer type=query charFilter class=a.b.c.MyFilterFactory / tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer What is more interesting, I can see that solr.LowerCaseFilterFactory is invoked even with wildcards. I tried to transform charFilter to normal Filter but the result is the same (it is not invoked with wildcards). Best -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-queries-and-custom-char-filter-tp4107241.html Sent from the Solr - User mailing list archive at Nabble.com.
Service Unavailable Error.
I having this error on my logs: ERROR - dat1 - 2013-12-18 11:40:11.704; org.apache.solr.update.StreamingSolrServers$1; error org.apache.solr.common.SolrException: Service Unavailable request: http://192.168.20.106:8983/solr/statistics-13_shard12_replica4/update?update.distrib=FROMLEADERdistrib.from=http%3A%2F%2F192.168.20.101%3A8983%2Fsolr%2Fstatistics-13_shard12_replica5%2Fwt=javabinversion=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:240) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) The machine is zen no load, no IO how it's possible be unavailable? I'm on Solr 4.6.0 solrcloud mode. - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Service-Unavailable-Error-tp4107242.html Sent from the Solr - User mailing list archive at Nabble.com.
No registered leader was found, but the UI says that I have.
I'm getting an error on Solr 4.6.0 about leader registation, the admin shows this: http://picpaste.com/a839446d0808df205aa7be78c780ed32.png But my logs says: ERROR - dat6 - 2013-12-18 11:43:54.253; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: No registered leader was found, collection:statistics-13 slice:shard23_1 at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:484) at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:467) at org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:223) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:428) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:89) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:151) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) Any idea how can I fix this? - Best regards -- View this message in context:
Re: solr as nosql - pulling all docs vs deep paging limitations
You can do range queries without an upper bound and just limit the number of results. Then you look at the last result to obtain the new lower bound. -- Jens On 17/12/13 20:23, Petersen, Robert wrote: My use case is basically to do a dump of all contents of the index with no ordering needed. It's actually to be a product data export for third parties. Unique key is product sku. I could take the min sku and range query up to the max sku but the skus are not contiguous because some get turned off and only some are valid for export so each range would return a different number of products (which may or may not be acceptable and I might be able to kind of hide that with some code). -Original Message- From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] Sent: Tuesday, December 17, 2013 10:41 AM To: solr-user Subject: Re: solr as nosql - pulling all docs vs deep paging limitations Hoss, What about SELECT * FROM WHERE ... like misusing Solr? I'm sure you've been asked many times for that. What if client don't need to rank results somehow, but just requesting unordered filtering result like they are used to in RDBMS? Do you feel it will never considered as a resonable usecase for Solr? or there is a well known approach for dealing with? On Tue, Dec 17, 2013 at 10:16 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Then I remembered we currently don't allow deep paging in our current : search indexes as performance declines the deeper you go. Is this still : the case? Coincidently, i'm working on a new cursor based API to make this much more feasible as we speak.. https://issues.apache.org/jira/browse/SOLR-5463 I did some simple perf testing of the strawman approach and posted the results last week... http://searchhub.org/coming-soon-to-solr-efficient-cursor-based-iterat ion-of-large-result-sets/ ...current iterations on the patch are to eliminate the strawman code to improve performance even more and beef up the test cases. : If so, is there another approach to make all the data in a collection : easily available for retrieval? The only thing I can think of is to ... : Then I was thinking we could have a field with an incrementing numeric : value which could be used to perform range queries as a substitute for : paging through everything. Ie queries like 'IncrementalField:[1 TO : 100]' 'IncrementalField:[101 TO 200]' but this would be difficult to : maintain as we update the index unless we reindex the entire collection : every time we update any docs at all. As i mentioned in the blog above, as long as you have a uniqueKey field that supports range queries, bulk exporting of all documents is fairly trivial by sorting on your uniqueKey field and using an fq that also filters on your uniqueKey field modify the fq each time to change the lower bound to match the highest ID you got on the previous page. This approach works really well in simple cases where you wnat to fetch all documents matching a query and then process/sort them by some other criteria on the client -- but it's not viable if it's important to you that the documents come back from solr in score order before your client gets them because you want to stop fetching once some criteria is met in your client. Example: you have billions of documents matching a query, you want to fetch all sorted by score desc and crunch them on your client to compute some stats, and once your client side stat crunching tells you you have enough results (which might be after the 1000th result, or might be after the millionth result) then you want to stop. SOLR-5463 will help even in that later case. The bulk of the patch should easy to use in the next day or so (having other people try out and test in their applications would be *very* helpful) and hopefully show up in Solr 4.7 -Hoss http://www.lucidworks.com/ -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Wildcard queries and custom char filter
Hi, Yes some factories implement org.apache.lucene.analysis.util.MultiTermAwareComponent Please see more http://wiki.apache.org/solr/MultitermQueryAnalysis On Wednesday, December 18, 2013 1:05 PM, michallos michal.ware...@gmail.com wrote: Hello, I have a problem with configuring custom char filter. When there are no wildcards in query then my filter is invoked. When there are wildcards, my filter is not invoked. It is possible to configure charFilter to be used with wildcard queries? I can see than with wildcards, TokenizerChain.charFilters is null. configuration: analyzer type=query charFilter class=a.b.c.MyFilterFactory / tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer What is more interesting, I can see that solr.LowerCaseFilterFactory is invoked even with wildcards. I tried to transform charFilter to normal Filter but the result is the same (it is not invoked with wildcards). Best -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-queries-and-custom-char-filter-tp4107241.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard queries and custom char filter
It works! Thanks. Last question: how to invoke charFilter before tokenizer? I can see that with tokenizer StandardTokenizerFactory without wildcards text 123-abc is broken into two tokens 123 and abc but text *123-abc* remain unchanged *123-abc*. It is possible to use charFilter before tokenizers? -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-queries-and-custom-char-filter-tp4107241p4107252.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: DataImport Handler, writing a new EntityProcessor
The first thing I would suggest is to try and run it not in debug mode. DIH's debug mode limits the number of documents it will take in, so that might be all that is wrong here. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: mathias@gmail.com [mailto:mathias@gmail.com] On Behalf Of Mathias Lux Sent: Wednesday, December 18, 2013 4:04 AM To: solr-user@lucene.apache.org Subject: DataImport Handler, writing a new EntityProcessor Hi all! I've got a question regarding writing a new EntityProcessor, in the same sense as the Tika one. My EntityProcessor should analyze jpg images and create document fields to be used with the LIRE Solr plugin (https://bitbucket.org/dermotte/liresolr). Basically I've taken the same approach as the TikaEntityProcessor, but my setup just indexes the first of 1000 images. I'm using a FileListEntityProcessor to get all JPEGs from a directory and then I'm handing them over (see [2]). My code for the EntityProcessor is at [1]. I've tried to use the DataSource as well as the filePath attribute, but it ends up all the same. However, the FileListEntityProcessor is able to read all the files according to the debug output, but I'm missing the link from the FileListEntityProcessor to the LireEntityProcessor. I'd appreciate any pointer or help :) cheers, Mathias [1] LireEntityProcessor http://pastebin.com/JFajkNtf [2] dataConfig http://pastebin.com/vSHucatJ -- Dr. Mathias Lux Klagenfurt University, Austria http://tinyurl.com/mlux-itec
solrcloud no server hosting shard
Hi guys, before starting note that I am new with solr and in particular with solrcloud. I have to index many many documents (10mln), last week I have complete my import handler and configuration so I have started import activity on solr using solrcloud with 10 shard (and without replicas :S ) on VM with 30giga of RAM and good performance (I don't know if 10 are too much). Today I see that during specific update (delete of wrong document) of a specific document following exception was thrown: org.apache.solr.common.SolrException: no servers hosting shard: at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:148) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:118) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) After restarting solrcloud the exception is thrown during update of another document. Nevertheless, search queries work fine but slow. I thank you so much in advance if you can help me with this exception or if you have any suggestion for my configuration. Is it a must to have some replicas of shards? can I add now a replica after some million of document indexed? To configure solrcloud I have essentially used default configuration and I have read general solrcloud wiki, are there any suggestions to use solr with this size of document in a more comfortable way? Thanks again, Giuseppe p.s. sorry for my english :) -- View this message in context: http://lucene.472066.n3.nabble.com/solrcloud-no-server-hosting-shard-tp4107268.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImport Handler, writing a new EntityProcessor
Unfortunately it is the same in non-debug, just the first document. I also output the params to sout, but it seems only the first one is ever arriving at my custom class. I've the feeling that I'm doing something seriously wrong here, based on a complete misunderstanding :) I basically assume that the nested entity processor will be called for each of the rows that come out from its parent. I've read somewhere, that the data has to be taken from the data source, and I've implemented that, but it doesn't seem to change anything. cheers, Mathias On Wed, Dec 18, 2013 at 3:05 PM, Dyer, James james.d...@ingramcontent.com wrote: The first thing I would suggest is to try and run it not in debug mode. DIH's debug mode limits the number of documents it will take in, so that might be all that is wrong here. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: mathias@gmail.com [mailto:mathias@gmail.com] On Behalf Of Mathias Lux Sent: Wednesday, December 18, 2013 4:04 AM To: solr-user@lucene.apache.org Subject: DataImport Handler, writing a new EntityProcessor Hi all! I've got a question regarding writing a new EntityProcessor, in the same sense as the Tika one. My EntityProcessor should analyze jpg images and create document fields to be used with the LIRE Solr plugin (https://bitbucket.org/dermotte/liresolr). Basically I've taken the same approach as the TikaEntityProcessor, but my setup just indexes the first of 1000 images. I'm using a FileListEntityProcessor to get all JPEGs from a directory and then I'm handing them over (see [2]). My code for the EntityProcessor is at [1]. I've tried to use the DataSource as well as the filePath attribute, but it ends up all the same. However, the FileListEntityProcessor is able to read all the files according to the debug output, but I'm missing the link from the FileListEntityProcessor to the LireEntityProcessor. I'd appreciate any pointer or help :) cheers, Mathias [1] LireEntityProcessor http://pastebin.com/JFajkNtf [2] dataConfig http://pastebin.com/vSHucatJ -- Dr. Mathias Lux Klagenfurt University, Austria http://tinyurl.com/mlux-itec -- PD Dr. Mathias Lux Klagenfurt University, Austria http://tinyurl.com/mlux-itec
Dynamically deriving the param value in solrconfig requestHandler
hi, Is there any possibility to derive a value to a param from other params like below, requestHandler name=/main class=com.solr.custom.handler.MySearchHandler arr name=components strquery/str strdebug/str /arr lst name=defaults str name=size_relaxed*size:['$minSize' TO '$maxSize'] */str //minSize and maxSize will be supplied as query parameter or else defaults to below values((i.e size_relaxed=size:[0 TO 1] )) *str name=minSize0/str str name=maxSize1/str* /lst /requestHandler Thanks Regards, Senthilnathan V
Re: solr as nosql - pulling all docs vs deep paging limitations
Aha! SOLR-5244 is a particular case which I'm asking about. I wonder who else consider it useful? (I.m sorry if I hijacked the thread) 18.12.2013 5:41 пользователь Joel Bernstein joels...@gmail.com написал: They are for different use cases. Hoss's approach, I believe, focuses on deep paging of ranked search results. SOLR-5244 focuses on the batch export of an entire unranked search result in binary format. It's basically a very efficient bulk extract for Solr. On Tue, Dec 17, 2013 at 6:51 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Joel - can you please elaborate a bit on how this compares with Hoss' approach? Complementary? Thanks, Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Tue, Dec 17, 2013 at 6:45 PM, Joel Bernstein joels...@gmail.com wrote: SOLR-5244 is also working in this direction. This focuses on efficient binary extract of entire search results. On Tue, Dec 17, 2013 at 2:33 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hoss is working on it. Search for deep paging or cursor in JIRA. Otis Solr ElasticSearch Support http://sematext.com/ On Dec 17, 2013 12:30 PM, Petersen, Robert robert.peter...@mail.rakuten.com wrote: Hi solr users, We have a new use case where need to make a pile of data available as XML to a client and I was thinking we could easily put all this data into a solr collection and the client could just do a star search and page through all the results to obtain the data we need to give them. Then I remembered we currently don't allow deep paging in our current search indexes as performance declines the deeper you go. Is this still the case? If so, is there another approach to make all the data in a collection easily available for retrieval? The only thing I can think of is to query our DB for all the unique IDs of all the documents in the collection and then pull out the documents out in small groups with successive queries like 'UniqueIdField:(id1 OR id2 OR ... OR idn)' 'UniqueIdField:(idn+1 OR idn+2 OR ... etc)' which doesn't seem like a very good approach because the DB might have been updated with new data which hasn't been indexed yet and so all the ids might not be in there (which may or may not matter I suppose). Then I was thinking we could have a field with an incrementing numeric value which could be used to perform range queries as a substitute for paging through everything. Ie queries like 'IncrementalField:[1 TO 100]' 'IncrementalField:[101 TO 200]' but this would be difficult to maintain as we update the index unless we reindex the entire collection every time we update any docs at all. Is this perhaps not a good use case for solr? Should I use something else or is there another approach that would work here to allow a client to pull groups of docs in a collection through the rest api until the client has gotten them all? Thanks Robi -- Joel Bernstein Search Engineer at Heliosearch -- Joel Bernstein Search Engineer at Heliosearch
RE: Solr failure results in misreplication?
Any chance you still have the logs from the servers hosting 1 2? I would open a JIRA ticket for this one as it sounds like something went terribly wrong on restart. You can update the /clusterstate.json to fix this situation. Lastly, it's recommended to use an OOM killer script with SolrCloud so that you don't end up with zombie nodes hanging around in your cluster. I use something like: -XX:OnOutOfMemoryError=$SCRIPT_DIR/oom_solr.sh $x %p $x in start script is the port # and %p is the process ID ... My oom_solr.sh script is something like this: #!/bin/bash SOLR_PORT=$1 SOLR_PID=$2 NOW=$(date +%F%T) ( echo Running OOM killer script for process $SOLR_PID for Solr on port 89$SOLR_PORT kill -9 $SOLR_PID echo Killed process $SOLR_PID ) | tee oom_killer-89$SOLR_PORT-$NOW.log I use supervisord do handle the restart after the process gets killed by the OOM killer, which is why you don't see the restart in this script ;-) Timothy Potter Sr. Software Engineer, LucidWorks www.lucidworks.com From: youknow...@heroicefforts.net youknow...@heroicefforts.net Sent: Tuesday, December 17, 2013 10:31 PM To: solr-user@lucene.apache.org Subject: Solr failure results in misreplication? My client has a test cluster Solr 4.6 with three instances 1, 2, and 3 hosting shards 1, 2, and 3, respectively. There is no replication in this cluster. We started receiving OOME during indexing; likely the batches were too large. The cluster was rebooted to restore the system. However, upon reboot, instance 2 now shows as a replica of shard 1 and its shard2 is down with a null range. Instance 2 is queryable shards.tolerant=truedistribute=false and returns a different set of records than instance 1 (as would be expected during normal operations). Clusterstate.json is similar to the following: mycollection:{ shard1:{ range:800-d554, state:active, replicas:{ instance1state:active..., instance2state:active... } }, shard3:{state:active.}, shard2:{ range:null, state:active, replicas:{ instance2{state:down} } }, maxShardsPerNode:1, replicationFactor:1 } Any ideas on how this would come to pass? Would manually correcting the clusterstate.json in Zk correct this situation?
Re: Wildcard queries and custom char filter
Hoh, I can see that when there are wildcards then KeywordTokenizerFactory is used instead of StandardTokenizerFactory. I created custom wildcard remover char filter for few specific cases (so I cannot use any of regex replacer filters) but event with that, KeywordTokenizerFactory is used. I thought charFilter is enough but there is more complicated logic in SolrQueryParserBase#handleBareTokenQuery that chooses KeywordTokenizerFactory before my charFilter is invoked! Is it possible to handle custom wildcard remover, so that StandardTokenizerFactory may be used? -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-queries-and-custom-char-filter-tp4107241p4107275.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr as nosql - pulling all docs vs deep paging limitations
: : What about SELECT * FROM WHERE ... like misusing Solr? I'm sure you've been : asked many times for that. : What if client don't need to rank results somehow, but just requesting : unordered filtering result like they are used to in RDBMS? : Do you feel it will never considered as a resonable usecase for Solr? or : there is a well known approach for dealing with? If you don't care about ordering, then the approach i described (either using SOLR-5463, or just using a sort by uniqueKey with increasing range filters on the id) should work fine -- the fact that they come back sorted by id is just an implementation detail that makes it possible to batch the records (the same way most SQL databases will likely give you back the docs based on whatever primary key index you have) I think the key difference between approaches like SOLR-5244 vs the cursor work in SOLR-5463 is that SOLR-5244 is really targeted at dumping all data about all docs from a core (matching the query) in a single request/response -- for something like SolrCloud, the client would manually need to hit each shard (but as i understand it fro mthe dscription, that's kind of the point, it's aiming to be a very low level bulk export). With the cursor approach in SOLR-5463, we do agregation across all shards, and we support arbitrary sorts, and you can control the batch size from the client and iterate over multiple request/responses of that size. if there is any network hucups, you can re-do a request. If you process half the docs that match (in a particular order) and then decide I've got all the docs i need for my purposes, ou can stop requesting the continuation of that cursor. -Hoss http://www.lucidworks.com/
Re: solr as nosql - pulling all docs vs deep paging limitations
: You can do range queries without an upper bound and just limit the number of : results. Then you look at the last result to obtain the new lower bound. exactly. instead of this: First: q=foostart=0rows=$ROWS After: q=foostart=$Xrows=$ROWS ...where $ROWS is how big a batch of docsy you can handle at one time, and you increase the value of $X by the value of $ROWS on each successive request, you can just do this... First: q=foostart=0rows=$ROWSsort=id+asc After: q=foostart=0rows=$ROWSsort=id+ascfq=id:{$X TO *] ...where $X is whatever the last id you got on the previous page. Or: you try out the patch in SOLR-5463 and do something like this... First: q=foostart=0rows=$ROWSsort=id+asccursorMark=* After: q=foostart=0rows=$ROWSsort=id+asccursorMark=$X ...where $X is whatever nextCursorMark you got from the previous page. -Hoss http://www.lucidworks.com/
Re: solr as nosql - pulling all docs vs deep paging limitations
Us too. That's going to be huge for us! Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. The Science of Influence Marketing 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Dec 18, 2013 at 9:55 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Aha! SOLR-5244 is a particular case which I'm asking about. I wonder who else consider it useful? (I.m sorry if I hijacked the thread) 18.12.2013 5:41 пользователь Joel Bernstein joels...@gmail.com написал: They are for different use cases. Hoss's approach, I believe, focuses on deep paging of ranked search results. SOLR-5244 focuses on the batch export of an entire unranked search result in binary format. It's basically a very efficient bulk extract for Solr. On Tue, Dec 17, 2013 at 6:51 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Joel - can you please elaborate a bit on how this compares with Hoss' approach? Complementary? Thanks, Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Tue, Dec 17, 2013 at 6:45 PM, Joel Bernstein joels...@gmail.com wrote: SOLR-5244 is also working in this direction. This focuses on efficient binary extract of entire search results. On Tue, Dec 17, 2013 at 2:33 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hoss is working on it. Search for deep paging or cursor in JIRA. Otis Solr ElasticSearch Support http://sematext.com/ On Dec 17, 2013 12:30 PM, Petersen, Robert robert.peter...@mail.rakuten.com wrote: Hi solr users, We have a new use case where need to make a pile of data available as XML to a client and I was thinking we could easily put all this data into a solr collection and the client could just do a star search and page through all the results to obtain the data we need to give them. Then I remembered we currently don't allow deep paging in our current search indexes as performance declines the deeper you go. Is this still the case? If so, is there another approach to make all the data in a collection easily available for retrieval? The only thing I can think of is to query our DB for all the unique IDs of all the documents in the collection and then pull out the documents out in small groups with successive queries like 'UniqueIdField:(id1 OR id2 OR ... OR idn)' 'UniqueIdField:(idn+1 OR idn+2 OR ... etc)' which doesn't seem like a very good approach because the DB might have been updated with new data which hasn't been indexed yet and so all the ids might not be in there (which may or may not matter I suppose). Then I was thinking we could have a field with an incrementing numeric value which could be used to perform range queries as a substitute for paging through everything. Ie queries like 'IncrementalField:[1 TO 100]' 'IncrementalField:[101 TO 200]' but this would be difficult to maintain as we update the index unless we reindex the entire collection every time we update any docs at all. Is this perhaps not a good use case for solr? Should I use something else or is there another approach that would work here to allow a client to pull groups of docs in a collection through the rest api until the client has gotten them all? Thanks Robi -- Joel Bernstein Search Engineer at Heliosearch -- Joel Bernstein Search Engineer at Heliosearch
Re: solr as nosql - pulling all docs vs deep paging limitations
On 12/17/13 1:16 PM, Chris Hostetter wrote: As i mentioned in the blog above, as long as you have a uniqueKey field that supports range queries, bulk exporting of all documents is fairly trivial by sorting on your uniqueKey field and using an fq that also filters on your uniqueKey field modify the fq each time to change the lower bound to match the highest ID you got on the previous page. Aha, very nice suggestion, I hadn't thought of this, when myself trying to figure out decent ways to 'fetch all documents matching a query' for some bulk offline processing. One question that I was never sure about when trying to do things like this -- is this going to end up blowing the query and/or document caches if used on a live Solr? By filling up those caches with the results of the 'bulk' export? If so, is there any way to avoid that? Or does it probably not really matter? Jonathan
Re: Solr3.4 on tomcat 7.0.23 - hung with error threw exception java.lang.IllegalStateException: Cannot call sendError() after the response has been committed
were you able to resolve this issue, and if so how?? I am encountering the same issue in a couple of solr versions (including 4.0 and 4.5) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr3-4-on-tomcat-7-0-23-hung-with-error-threw-exception-java-lang-IllegalStateException-Cannot-call-tp4087342p4107286.html Sent from the Solr - User mailing list archive at Nabble.com.
org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes. Skipping IW.commit.
Hi, I am using ExtractingRequestHandler to extract text from binary data and then index the text but getting *error: org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes. Skipping IW.commit.* *solrconfig.xml:* requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=lowernamestrue/str str name=fmap.contentattachment/str str name=uprefixignored_/str /lst /requestHandler lib dir=/var/solrdev/solr-4.5.0/contrib/extraction/lib regex=.*\.jar / lib dir=/var/solrdev/solr-4.5.0/dist/ regex=.*\.jar / lib dir=/var/solrdev/solr-4.5.0/dist/ regex=solr-cell-4.5.0.jar / *schema.xml:* field name=attachment type=string indexed=true stored=true required=false multiValued=true/ fieldType name=string class=solr.TextField omitNorms=true *CURL request:* curl http://localhost:8085/solr/openwave/update/extract?literal.msg-uid=9commit=true; -F myFile=Dummy.doc I do not understand where the problem is ? Pls. suggest me -- View this message in context: http://lucene.472066.n3.nabble.com/org-apache-solr-update-DirectUpdateHandler2-No-uncommitted-changes-Skipping-IW-commit-tp4107285.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr as nosql - pulling all docs vs deep paging limitations
: One question that I was never sure about when trying to do things like this -- : is this going to end up blowing the query and/or document caches if used on a : live Solr? By filling up those caches with the results of the 'bulk' export? : If so, is there any way to avoid that? Or does it probably not really matter? q={!cache=false}... -Hoss http://www.lucidworks.com/
Shards stuck in down state after splitting shard - How can we recover from a failed SPLITSHARD?
I called SPLITSHARD on a shard in an existing SolrCloud instance, where the shard had ~1 million documents in it. It's been about 3 hours since that splitting has completed, and the subshards are still stuck in a Down state. They are reported as down in localhost/solr/#/~cloud, and I'm unable to query my index. How can we recover from a failed SPLITSHARD operation? -- View this message in context: http://lucene.472066.n3.nabble.com/Shards-stuck-in-down-state-after-splitting-shard-How-can-we-recover-from-a-failed-SPLITSHARD-tp4107297.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImport Handler, writing a new EntityProcessor
Hi Mathias, I'd recommend testing one thing at a time. See if you can get it to work for one image before you try a directory of images. Also try testing using the solr-testframework using your ide (I use Eclipse) to debug rather than your browser/print statements. Hopefully that will give you some more specific knowledge of what's happening around your plugin. I also wrote an EntityProcessor plugin to read from a properties filehttps://issues.apache.org/jira/browse/SOLR-3928. Hopefully that'll give you some insight about this kind of Solr plugin and testing them. Cheers, Tricia On Wed, Dec 18, 2013 at 3:03 AM, Mathias Lux m...@itec.uni-klu.ac.atwrote: Hi all! I've got a question regarding writing a new EntityProcessor, in the same sense as the Tika one. My EntityProcessor should analyze jpg images and create document fields to be used with the LIRE Solr plugin (https://bitbucket.org/dermotte/liresolr). Basically I've taken the same approach as the TikaEntityProcessor, but my setup just indexes the first of 1000 images. I'm using a FileListEntityProcessor to get all JPEGs from a directory and then I'm handing them over (see [2]). My code for the EntityProcessor is at [1]. I've tried to use the DataSource as well as the filePath attribute, but it ends up all the same. However, the FileListEntityProcessor is able to read all the files according to the debug output, but I'm missing the link from the FileListEntityProcessor to the LireEntityProcessor. I'd appreciate any pointer or help :) cheers, Mathias [1] LireEntityProcessor http://pastebin.com/JFajkNtf [2] dataConfig http://pastebin.com/vSHucatJ -- Dr. Mathias Lux Klagenfurt University, Austria http://tinyurl.com/mlux-itec
Re: solr as nosql - pulling all docs vs deep paging limitations
On Wed, Dec 18, 2013 at 8:03 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : : What about SELECT * FROM WHERE ... like misusing Solr? I'm sure you've been : asked many times for that. : What if client don't need to rank results somehow, but just requesting : unordered filtering result like they are used to in RDBMS? : Do you feel it will never considered as a resonable usecase for Solr? or : there is a well known approach for dealing with? If you don't care about ordering, then the approach i described (either using SOLR-5463, or just using a sort by uniqueKey with increasing range filters on the id) should work fine -- the fact that they come back sorted by id is just an implementation detail that makes it possible to batch the records From the functional standpoint it's true, but performance might matter, in that side cases. eg. I wonder why the priority queue is needed even if we request sort=_docid_. (the same way most SQL databases will likely give you back the docs based on whatever primary key index you have) I think the key difference between approaches like SOLR-5244 vs the cursor work in SOLR-5463 is that SOLR-5244 is really targeted at dumping all data about all docs from a core (matching the query) in a single request/response -- for something like SolrCloud, the client would manually need to hit each shard (but as i understand it fro mthe dscription, that's kind of the point, it's aiming to be a very low level bulk export). With the cursor approach in SOLR-5463, we do agregation across all shards, and we support arbitrary sorts, and you can control the batch size from the client and iterate over multiple request/responses of that size. if there is any network hucups, you can re-do a request. If you process half the docs that match (in a particular order) and then decide I've got all the docs i need for my purposes, ou can stop requesting the continuation of that cursor. -Hoss http://www.lucidworks.com/ -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Solr could replace shards
I am considering using SolrCloud, but I have a use case that I am not sure if it covers. I would like to keep an index up to date in realtime, but also I would like to sometimes restate the past. The way that I would restate the past is to do batch processing over historical data. My idea is that I would have the Solr collection sharded by date range. As I move forward in time I would add more shards. For restating historical data I would have a separate process that actually indexes a shards worth of data. (This keeps the servers that are meant for production search from having to handle the load of indexing historically.) I would then move the index files to the solr servers and register the newly created index with the server replacing the existing shards. I used to be able to do something similar pre-SolrCloud by using the core admin. But this did not have the benefit of having one search for the entire collection. I had to manually query each of the cores to get the full search index. Essentially the question is: 1- is it possible to shard by date range in this way? 2- is it possible to swap out the index used by a shard? 3- is there a different way I should be thinking of this? Max
Re: Shards stuck in down state after splitting shard - How can we recover from a failed SPLITSHARD?
Hi, Is the parent shard currently active? What does the clusterstate.json say? The subshard could be stuck in down when it's trying to recover but as far as I remember, the sub-shards only get marked active (and the parent goes inactive) once the recovery and replication (for as many replicas as the parent shard) are completed. On Wed, Dec 18, 2013 at 10:01 AM, cwhi chris.whi...@gmail.com wrote: I called SPLITSHARD on a shard in an existing SolrCloud instance, where the shard had ~1 million documents in it. It's been about 3 hours since that splitting has completed, and the subshards are still stuck in a Down state. They are reported as down in localhost/solr/#/~cloud, and I'm unable to query my index. How can we recover from a failed SPLITSHARD operation? -- View this message in context: http://lucene.472066.n3.nabble.com/Shards-stuck-in-down-state-after-splitting-shard-How-can-we-recover-from-a-failed-SPLITSHARD-tp4107297.html Sent from the Solr - User mailing list archive at Nabble.com. -- Anshum Gupta http://www.anshumgupta.net
Re: solrcloud no server hosting shard
Hi Guiseppe; First of all you should give us the full error log to understand the reason behind the error. On the other hand it is not a must to have extra replicas for your shards but you really should consider to have replicas. When you start up a new Solr instance it will be assigned to one of your shards that is directed by Zookeeper ensemble as a round robin process. Thanks; Furkan KAMACI 18 Aralık 2013 Çarşamba tarihinde gf80 giuseppe_fe...@hotmail.com adlı kullanıcı şöyle yazdı: Hi guys, before starting note that I am new with solr and in particular with solrcloud. I have to index many many documents (10mln), last week I have complete my import handler and configuration so I have started import activity on solr using solrcloud with 10 shard (and without replicas :S ) on VM with 30giga of RAM and good performance (I don't know if 10 are too much). Today I see that during specific update (delete of wrong document) of a specific document following exception was thrown: org.apache.solr.common.SolrException: no servers hosting shard: at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:148) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:118) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) After restarting solrcloud the exception is thrown during update of another document. Nevertheless, search queries work fine but slow. I thank you so much in advance if you can help me with this exception or if you have any suggestion for my configuration. Is it a must to have some replicas of shards? can I add now a replica after some million of document indexed? To configure solrcloud I have essentially used default configuration and I have read general solrcloud wiki, are there any suggestions to use solr with this size of document in a more comfortable way? Thanks again, Giuseppe p.s. sorry for my english :) -- View this message in context: http://lucene.472066.n3.nabble.com/solrcloud-no-server-hosting-shard-tp4107268.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: No registered leader was found, but the UI says that I have.
Hi; Do you have any error log for leader election? Also do you have this error always or just within the time period of while the other replica is recovery mode? Thanks; Furkan KAMACI 18 Aralık 2013 Çarşamba tarihinde yriveiro yago.rive...@gmail.com adlı kullanıcı şöyle yazdı: I'm getting an error on Solr 4.6.0 about leader registation, the admin shows this: http://picpaste.com/a839446d0808df205aa7be78c780ed32.png But my logs says: ERROR - dat6 - 2013-12-18 11:43:54.253; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: No registered leader was found, collection:statistics-13 slice:shard23_1 at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:484) at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:467) at org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:223) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:428) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:89) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:151) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
Re: PeerSync Recovery fails, starting Replication Recovery
Hi Anca; Could you check the conversation at here: http://lucene.472066.n3.nabble.com/ColrCloud-IOException-occured-when-talking-to-server-at-td4061831.html Thanks; Furkan KAMACI 18 Aralık 2013 Çarşamba tarihinde Anca Kopetz anca.kop...@kelkoo.com adlı kullanıcı şöyle yazdı: Hi, In our SolrCloud cluster (2 shards, 8 replicas), the replicas go from time to time into recovering state, and it takes more than 10 minutes to finish to recover. In logs, we see that PeerSync Recovery fails with the message : PeerSync: core=fr_green url=http://solr-08/searchsolrnodefr too many updates received since start - startingUpdates no longer overlaps with our currentUpdates Then Replication Recovery starts. Is there something we can do to avoid the failure of Peer Recovery so that the recovery process is more rapid (less than 10 minutes) ? The full trace log is here : 2013-12-05 13:51:53,740 [http-8080-46] INFO org.apache.solr.handler.admin.CoreAdminHandler:handleRequestRecoveryAction:705 - It has been requested that we recover 2013-12-05 13:51:53,740 [http-8080-112] INFO org.apache.solr.handler.admin.CoreAdminHandler:handleRequestRecoveryAction:705 - It has been requested that we recover 2013-12-05 13:51:53,740 [http-8080-112] INFO org.apache.solr.servlet.SolrDispatchFilter:handleAdminRequest:658 - [admin] webapp=null path=/admin/cores params={action=REQUESTRECOVERYcore=fr_greenwt=javabinversion=2} status=0 QTime=0 2013-12-05 13:51:53,740 [Thread-1544] INFO org.apache.solr.cloud.ZkController:publish:1017 - publishing core=fr_green state=recovering 2013-12-05 13:51:53,741 [http-8080-46] INFO org.apache.solr.servlet.SolrDispatchFilter:handleAdminRequest:658 - [admin] webapp=null path=/admin/cores params={action=REQUESTRECOVERYcore=fr_greenwt=javabinversion=2} status=0 QTime=1 2013-12-05 13:51:53,740 [Thread-1543] INFO org.apache.solr.cloud.ZkController:publish:1017 - publishing core=fr_green state=recovering 2013-12-05 13:51:53,743 [Thread-1544] INFO org.apache.solr.cloud.ZkController:publish:1021 - numShards not found on descriptor - reading it from system property 2013-12-05 13:51:53,746 [Thread-1543] INFO org.apache.solr.cloud.ZkController:publish:1021 - numShards not found on descriptor - reading it from system property 2013-12-05 13:51:53,755 [Thread-1543] WARN org.apache.solr.cloud.RecoveryStrategy:close:105 - Stopping recovery for zkNodeName=solr-08_searchsolrnodefr_fr_greencore=fr_green 2013-12-05 13:51:53,756 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:run:216 - Starting recovery process. core=fr_green recoveringAfterStartup=false 2013-12-05 13:51:53,762 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:495 - Finished recovery process. core=fr_green 2013-12-05 13:51:53,762 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:run:216 - Starting recovery process. core=fr_green recoveringAfterStartup=false 2013-12-05 13:51:53,765 [RecoveryThread] INFO org.apache.solr.cloud.ZkController:publish:1017 - publishing core=fr_green state=recovering 2013-12-05 13:51:53,765 [RecoveryThread] INFO org.apache.solr.cloud.ZkController:publish:1021 - numShards not found on descriptor - reading it from system property 2013-12-05 13:51:53,767 [RecoveryThread] INFO org.apache.solr.client.solrj.impl.HttpClientUtil:createClient:103 - Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false 2013-12-05 13:51:54,777 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader:process:210 - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 18) 2013-12-05 13:51:56,804 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:356 - Attempting to PeerSync from http://solr-02/searchsolrnodefr/fr_green/ core=fr_green - recoveringAfterStartup=false 2013-12-05 13:51:56,806 [RecoveryThread] WARN org.apache.solr.update.PeerSync:sync:232 - PeerSync: core=fr_green url= http://solr-08/searchsolrnodefr too many updates received since start - startingUpdates no longer overlaps with our currentUpdates 2013-12-05 13:51:56,806 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:394 - PeerSync Recovery was not successful - trying replication. core=fr_green 2013-12-05 13:51:56,806 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:397 - Starting Replication Recovery. core=fr_green 2013-12-05 13:51:56,806 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:doRecovery:399 - Begin buffering updates. core=fr_green 2013-12-05 13:51:56,806 [RecoveryThread] INFO org.apache.solr.cloud.RecoveryStrategy:replicate:127 - Attempting to replicate from http://solr-02/searchsolrnodefr/fr_green/. core=fr_green 2013-12-05 13:51:56,806 [RecoveryThread] INFO org.apache.solr.client.solrj.impl.HttpClientUtil:createClient:103 - Creating new http client,
Solr 4.5 - Solr Cloud is creating new cores on random nodes
Hello all, I am currently in the process of building out a solr cloud with solr 4.5 on 4 nodes with some pretty hefty hardware. When we create the collection we have a replication factor of 2 and store 2 replicas per node. While we have been experimenting, which has involved bringing nodes up and down as well as tanking them with OOM errors while messing with jvm settings, we have observed a disturbing trend where we will bring nodes back up and suddenly shard x has 6 replicas spread across the nodes. These replicas will have been created with no action on our part and we would much rather they not be created at all. I have not been able to determine whether this is a bug or a feature. If its a bug, I will happily provide what I can to track it down. If it is a feature, I would very much like to turn it off. Any Information is appreciated. Regards, Ryan Wilson rpwils...@gmail.com
email datasource connect timeout issue
Hi all, When i try to set up a email data source as http://wiki.apache.org/solr/MailEntityProcessor , connect timeout Exception happened. i am sure the user and password is correct, and the rss data source also work well. anyone can do me a favior? This issue base on solr4.5 with tomcat7, exception information as following: -- Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Connection failed Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:270) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:476) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:457) Caused by: java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Connection failed Processing Document # 1 at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:410) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:323) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:231) ... 3 more Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Connection failed Processing Document # 1 at org.apache.solr.handler.dataimport.MailEntityProcessor.connectToMailBox(MailEntityProcessor.java:271) at org.apache.solr.handler.dataimport.MailEntityProcessor.getNextMail(MailEntityProcessor.java:121) at org.apache.solr.handler.dataimport.MailEntityProcessor.nextRow(MailEntityProcessor.java:112) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:469) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:408) ... 5 more Caused by: javax.mail.MessagingException: Connection timed out; nested exception is: java.net.ConnectException: Connection timed out at com.sun.mail.imap.IMAPStore.protocolConnect(IMAPStore.java:571) at javax.mail.Service.connect(Service.java:288) at javax.mail.Service.connect(Service.java:169) at org.apache.solr.handler.dataimport.MailEntityProcessor.connectToMailBox(MailEntityProcessor.java:267) ... 10 more Caused by: java.net.ConnectException: Connection timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:310) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:176) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:163) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) at java.net.Socket.connect(Socket.java:542) at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:570) at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:160) at com.sun.mail.util.SocketFetcher.createSocket(SocketFetcher.java:233) at com.sun.mail.util.SocketFetcher.getSocket(SocketFetcher.java:189) at com.sun.mail.iap.Protocol.init(Protocol.java:107) at com.sun.mail.imap.protocol.IMAPProtocol.init(IMAPProtocol.java:104) at com.sun.mail.imap.IMAPStore.protocolConnect(IMAPStore.java:538) ... 13 more -- Thanks in advanced. Thanks, Kidd For the ideal, never give up, fighting!
Re: Solr-839 and version 4.5 (XmlQueryParser)
Hi, Just in case it is of use to anyone, I managed to compile the 4.0 patch by changing the line where new CoreParser is created to below. CoreParser parser = new CoreParser(defaultField, getReq().getSchema().getQueryAnalyzer()); The parser seems to work for the simple tests that I have done so far. Regards Puneet On Tue, Dec 17, 2013 at 10:18 PM, Daniel Collins danwcoll...@gmail.comwrote: Do you need it? Our workaround was to pass null, from what we could tell the (lucene) QueryParser which is needs is only used for parsing UserQuery constructs, and we never used that construct. The problem is that SolrQueryParser is derived from Solr's QueryParser class which has now diverged from the Lucene one. Will try to get our patches updated and issued over Xmas. On 17 December 2013 14:53, Puneet Pawaia puneet.paw...@gmail.com wrote: Hi All, Not being a Java expert, I used Daniel Collins' modification to patch with version 4.0 source. It works for a start. Have not been able to test much. Next, I tried the same modifications with Solr 4.6.0. This throws up 2 errors. I resolved public Query parse() throws ParseException { by changing to public Query parse() throws SyntaxError { However, I am not able to get the second error resolved. SolrQueryParser lparser; CoreParser parser = new CoreParser(getReq().getSchema().getQueryAnalyzer(), lparser); CoreParser does not take SolrQueryParser as its parameter. It asks for QueryParser. Is there something I am missing or should be doing that I am not doing? TIA Regards Puneet
Re: PostingsSolrHighlighter
Hi Josip that's quite weird, to my experience highlight is strict on string field which needs a exact match, text fields should be fine. I copy your schema definition and do a quick test in a new core, everything is default from the tutorial, and the search component is using solr.HighlightComponent . search on searchable_text can highlight text, I copied your search url and just change the host part, the input parameters are exactly the same, result is attached. Can you upload your complete solrconfig.xml and schema.xml? On 18 December 2013 19:02, Josip Delic j...@lugensa.com wrote: Am 18.12.2013 09:55, schrieb Liu Bo: hi Josip hi liu, for the 1 question we've done similar things: copying search field to a text field. But highlighting is normally on specific fields such as tittle depending on how the search content is displayed to the front end, you can search on text and highlight on the field you wanted by specify hl.fl ref: http://wiki.apache.org/solr/HighlightingParameters#hl.fl thats exactly what i'm doing in that pastebin: http://pastebin.com/13Uan0ZF I'm searing there for 'q=searchable_text:labore' this is present in 'text' and in the copyfield 'searchable_text' but it is not highlighted in 'text' (hl.fl=text) The same query is working if set 'q=text:labore' as you can see in http://pastebin.com/4CP8XKnr For 2 question i figured out that the PostingsSolrHighlighter ellipsis is not like i thought for adding ellipsis to start or/and end in highlighted text. It is instead used to combine multiple snippets together if snippets is 1. cheers josip On 17 December 2013 02:29, Josip Delic j...@lugensa.com wrote: Hi @all, i am playing with the PostingsSolrHighlighter. I'm running solr 4.6.0 and my configuration is from here: https://lucene.apache.org/solr/4_6_0/solr-core/org/ apache/solr/highlight/ PostingsSolrHighlighter.html Search query and result (not working): http://pastebin.com/13Uan0ZF Schema (not complete): http://pastebin.com/JGa38UDT Search query and result (working): http://pastebin.com/4CP8XKnr Solr config: searchComponent class=solr.HighlightComponent name=highlight highlighting class=org.apache.solr.highlight. PostingsSolrHighlighter/ /searchComponent So this is working just fine, but now i have some questions: 1.) With the old default highlighter component it was possible to search in searchable_text and to retrive highlighted text. This is essential, because we use copyfield to put almost everything to searchable_text (title, subtitle, description, ...) 2.) I can't get ellipsis working i tried hl.tag.ellipsis=..., f.text.hl.tag.ellipsis=..., configuring it in RequestHandler noting seems to work, maxAnalyzedChars is just cutting the sentence? Kind Regards Josip Delic -- All the best Liu Bo http://localhost:8080/solr/try/select?wt=jsonfl=text%2Cscore=hl=truehl.fl=textq=%28searchable_text%3Alabore%29rows=10sort=score+descstart=0 { responseHeader: { status: 0, QTime: 36, params: { sort: score desc, fl: text, start: 0, ,score: , q: (searchable_text:labore), hl.fl: text, wt: json, hl: true, rows: 10 } }, response: { numFound: 3, start: 0, docs: [ { text: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. }, { text: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. }, { text: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata
Concurrent request configurations for Solr Processors
Hi All, I have written a custom update request processor and configured a UpdateRequestProcessor chain in solrconfig.xml as below; updateRequestProcessorChain name=stanbolInterceptor processor class= *com.solr.stanbol.processor.StanbolContentProcessorFactory* / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain Can I please know how can I configure the number of concurrent requests for my processor? What is the default number of concurrent requests per a Solr processor? Thanks, Dileepa