Re: Solr query syntax.
Im using the default qparser that come with solr 4.4 , Is there anything better? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-query-syntax-tp4103784p4104344.html Sent from the Solr - User mailing list archive at Nabble.com.
luke 4.5.0 released
Hello! I have just released luke 4.5.0 along with the binary. It's version is reflecting the Lucene's version underneath. Feel free to test this and give feedback / submit bug fixes / patches. https://github.com/DmitryKey/luke/releases/tag/4.5.0 Thanks. -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan
SolrCloud FunctionQuery inconsistency
Hi, I have a solrcloud with 4 shards. They are running normally. How is possible that the same function query returns different results? And it happens even in the same shard? However, when sort by ptime desc, the result is consistent. The dateDeboost generate the time-weight from ptime, which is multiplied by the score. The result is as follows: { responseHeader:{ status:0, QTime:7, params:{ fl:id, shards:shard3, cache:false, indent:true, start:0, q:{!boost b=dateDeboost(ptime)}channelid:0082 (title:\abc\ || dkeys:\abc\), wt:json, rows:5}}, response:{numFound:121,start:0,maxScore:0.5319116,docs:[ { id:9EORHN5I00824IHR}, { id:9EOPQGOI00824IMP}, { id:9EMATM6900824IHR}, { id:9EJLBOEN00824IHR}, { id:9E6V45IM00824IHR}] }} { responseHeader:{ status:0, QTime:6, params:{ fl:id, shards:shard3, cache:false, indent:true, start:0, q:{!boost b=dateDeboost(ptime)}channelid:0082 (title:\abc\ || dkeys:\abc\), wt:json, rows:5}}, response:{numFound:121,start:0,maxScore:0.5319117,docs:[ { id:9EOPQGOI00824IMP}, { id:9EORHN5I00824IHR}, { id:9EMATM6900824IHR}, { id:9EJLBOEN00824IHR}, { id:9E1LP3S300824IHR}] }} -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346.html Sent from the Solr - User mailing list archive at Nabble.com.
some cores goes down during indexing
Hi I have strange situation. During indexing some of cores goes down: ZkController.publish(1017) | publishing core=shops5 state=down ZkController.register(785) | Register replica - core:shops5 address: http://host77:8280/solr collection:shops5 shard:shard1 ZkController.register(810) | We are http://host77:8280/solr/shops5/ and leader is http://host136:8280/solr/shops5/ After that core doesn't register as working in the cloud, even if it should (It can process requests). For now I can only restart solr to fix the situation. Reload core doesn't help. Has someone faced similar problem? Some info: Multiple Solr 4.5.1 in SolrCloud 3x Zk http://host77:8280/solr/#/shops5/replication: Index Version Gen Size Master (Searching) 1385955301185 67 127.62 KB Master (Replicable) 1385955301185 67 - http://host136:8280/solr/#/shops5/replication: Index Version Gen Size Master (Searching) 1385955301218 68 127.65 KB Master (Replicable) 1385955301218 68 - http://host141:8280/solr/#/shops5/replication: Index Version Gen Size Master (Searching) 1385955301265 68 127.37 KB Master (Replicable) 1385955301265 68 - Logs from other core: ZkController.publish(1017) | publishing core=shops3 state=down ZkController.register(785) | Register replica - core:shops3 address: http://host77:8280/solr collection:shops3 shard:shard1 ZkController.register(810) | We are http://host77:8280/solr/shops3/ and leader is http://host136:8280/solr/shops3/ ZkController.register(841) | No LogReplay needed for core=shops3 baseURL= http://host77:8280/solr ZkController.checkRecovery(993) | Core needs to recover:shops3 RecoveryStrategy.run(216) | Starting recovery process. core=shops3 recoveringAfterStartup=false ZkController.publish(1017) | publishing core=shops3 state=recovering RecoveryStrategy.doRecovery(356) | Attempting to PeerSync from http://host136:8280/solr/shops3/ core=shops3 - recoveringAfterStartup=false RecoveryStrategy.doRecovery(368) | PeerSync Recovery was successful - registering as Active. core=shops3 ZkController.publish(1017) | publishing core=shops3 state=active SolrCore.registerSearcher(1812) | [shops3] Registered new searcher Searcher@45df7f8c main{StandardDirectoryReader(segments_ik:1977:nrt _n1(4.5.1):C97)} PeerSync.sync(186) | PeerSync: core=shops3 url=http://host77:8280/solrSTART replicas=[ http://host136:8280/solr/shops3/] nUpdates=100 PeerSync.handleVersions(346) | PeerSync: core=shops3 url= http://host77:8280/solr Received 97 versions from host136:8280/solr/shops3/ PeerSync.handleVersions(399) | PeerSync: core=shops3 url= http://host77:8280/solr Our versions are newer. ourLowThreshold=1453188869165940736 otherHigh=1453279151809101824 PeerSync.sync(272) | PeerSync: core=shops3 url=http://host77:8280/solrDONE. sync succeeded above lines are missing for core shops5 -- Grzegorz Sobczyk
Solr non-suuported languages
hi , I have a requirement to index and search few languages that are not supported by solr. ( E.g countries like Slovenia, Moldova, Belarus etc.) If i need to do only exact match against these langauges, what sort of analyser, tokenizers would suit thanks. Thanks, Prasi
Re: Solr non-suuported languages
Hi Prasi, text_general thats ships with example schema.xml would suit. On Monday, December 2, 2013 12:35 PM, Prasi S prasi1...@gmail.com wrote: hi , I have a requirement to index and search few languages that are not supported by solr. ( E.g countries like Slovenia, Moldova, Belarus etc.) If i need to do only exact match against these langauges, what sort of analyser, tokenizers would suit thanks. Thanks, Prasi
Re: Solr query syntax.
Hi, Choice of query parser depends on your needs. I am just surprised that you used prefix notation in your example. Default query parser syntax for and(blabla , name: george) is q=blabla AND name:george Term blabla (which does not consider field) parsed against default search field. Default field is set via df parameter. https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser On Monday, December 2, 2013 10:17 AM, elmerfudd na...@012.net.il wrote: Im using the default qparser that come with solr 4.4 , Is there anything better? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-query-syntax-tp4103784p4104344.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr query syntax.
The edismax (ExtendedDisMax) query parser is the best, overall. There are other specialized query parsers with features that edismax does not have (e.g., surround for span queries, and complex phrase for wildcards in phrases.) -- Jack Krupansky -Original Message- From: elmerfudd Sent: Monday, December 02, 2013 3:17 AM To: solr-user@lucene.apache.org Subject: Re: Solr query syntax. Im using the default qparser that come with solr 4.4 , Is there anything better? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-query-syntax-tp4103784p4104344.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Constantly increasing time of full data import
Update: I can see that times increases when the search load is higher. During nights and weekends full load times doesn't increase. So it is not caused by the number of documents being loaded (during weekends we have the same number of new documents) but number of queries / minute. Anyone observe such strange behaviour? It is critical for us. Best, Michal -- View this message in context: http://lucene.472066.n3.nabble.com/Constantly-increasing-time-of-full-data-import-tp4103873p4104370.html Sent from the Solr - User mailing list archive at Nabble.com.
How Whatsapp applies search techniques for conversation?
I was just wondering how Whatsapp uses to implement search in the conversations history. Is it the same thing used for all kinds of android app supporting search on chat/conversations? Has anyone implemented on similar lines? Thanks Kumar Anurag - Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/How-Whatsapp-applies-search-techniques-for-conversation-tp4104366.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Auto optimized of Solr indexing results
TieredMergePolicy is the default even though it's commented out in solrconfig, it's still being used. So there's nothing to do. Given the size of your index, you can actually do whatever you please. Optimizing it will shrink its size, but frankly your index is so small I doubt you'll see any noticeable difference. They'll self-purge as you re-crawl eventually. In all, I think you can mostly ignore the issue. Best, Erick On Sun, Dec 1, 2013 at 8:00 PM, Bayu Widyasanyata bwidyasany...@gmail.comwrote: Hi Erick, After waiting for some days abt. a week (I did daily crawling indexing), here are the docs summary: Num Docs: 9738 Max Doc: 15311 Deleted Docs: 5573 Version: 781 Segment Count: 5 The percentage of deletedDocs of NumDocs is near 57%. In the other, the TieredMergePolicy in solrconfig.xml is still disabled. !-- mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce10/int int name=segmentsPerTier10/int /mergePolicy -- Should we enable it and wait for the effect? Thanks! On Wed, Nov 20, 2013 at 9:55 PM, Bayu Widyasanyata bwidyasany...@gmail.comwrote: Thanks Erick. I will check that on next round. --- wassalam, [bayu] /sent from Android phone/ On Nov 20, 2013 7:45 PM, Erick Erickson erickerick...@gmail.com wrote: You probably shouldn't optimize at all. The default TieredMergePolicy will eventually purge the deleted files' data, which is really what optimize does. So despite its name, most of the time it's not really worth the effort. Take a look at your Solr admin page, the overview link for a core. If the number of deleted docs is a significant percentage of your numDocs (I typically use 20% or so, but YMMV) then optimize might be worthwhile. Otherwise, it's a distraction unless and until you have some evidence that it actually makes a difference. Best, Erick On Wed, Nov 20, 2013 at 7:33 AM, Bayu Widyasanyata bwidyasany...@gmail.comwrote: Hi, After successfully configured re-crawling script, I sometimes checked and found on Solr Admin that Optimized status of my collection is not optimized (slash icon). Hence I did optimized steps manually. How to make my crawling optimized automatically? Should we restart Solr (I use Tomcat) as shown on here [1] [1] http://wiki.apache.org/nutch/Crawl Thanks! -- wassalam, [bayu] -- wassalam, [bayu]
Re: luke 4.5.0 released
Excellent! thanks! On Mon, Dec 2, 2013 at 3:27 AM, Dmitry Kan solrexp...@gmail.com wrote: Hello! I have just released luke 4.5.0 along with the binary. It's version is reflecting the Lucene's version underneath. Feel free to test this and give feedback / submit bug fixes / patches. https://github.com/DmitryKey/luke/releases/tag/4.5.0 Thanks. -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan
Re: SolrCloud FunctionQuery inconsistency
I'm not quite sure what you're seeing as inconsistent, you didn't say. Is it the maxScore? Did you index any docs in the mean time? Even though both show 121 docs, if you updated some docs it might affect the score because the terms from the old docs still affect tf/idf calcs and thus the boosted score. Or if an optimize or merge happened, that might also affect things. Best, Erick On Mon, Dec 2, 2013 at 3:33 AM, sling sling...@gmail.com wrote: Hi, I have a solrcloud with 4 shards. They are running normally. How is possible that the same function query returns different results? And it happens even in the same shard? However, when sort by ptime desc, the result is consistent. The dateDeboost generate the time-weight from ptime, which is multiplied by the score. The result is as follows: { responseHeader:{ status:0, QTime:7, params:{ fl:id, shards:shard3, cache:false, indent:true, start:0, q:{!boost b=dateDeboost(ptime)}channelid:0082 (title:\abc\ || dkeys:\abc\), wt:json, rows:5}}, response:{numFound:121,start:0,maxScore:0.5319116,docs:[ { id:9EORHN5I00824IHR}, { id:9EOPQGOI00824IMP}, { id:9EMATM6900824IHR}, { id:9EJLBOEN00824IHR}, { id:9E6V45IM00824IHR}] }} { responseHeader:{ status:0, QTime:6, params:{ fl:id, shards:shard3, cache:false, indent:true, start:0, q:{!boost b=dateDeboost(ptime)}channelid:0082 (title:\abc\ || dkeys:\abc\), wt:json, rows:5}}, response:{numFound:121,start:0,maxScore:0.5319117,docs:[ { id:9EOPQGOI00824IMP}, { id:9EORHN5I00824IHR}, { id:9EMATM6900824IHR}, { id:9EJLBOEN00824IHR}, { id:9E1LP3S300824IHR}] }} -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346.html Sent from the Solr - User mailing list archive at Nabble.com.
ShardSplit errors..
Hi, I have been trying to split a shard with little success. I'm probably missing something obvious but would appreciate a little help. Solr version: 4.6.0 Number of documents in the Shard: 2,933,059 Index size: 6.52 I know I have some setting somewhere that I need to change but I believe I have changed everything available. At first I had a write.lock timeout so I upped that setting and got passed it. Command: curl ' http://localhost:8983/solr/admin/collections?action=SPLITSHARDcollection=sessionfiltersetshard=shard1 ' Message returned: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status500/intint name=QTime300023/int/lstlst name=errorstr name=msgsplitshard the collection time out:300s/strstr name=traceorg.apache.solr.common.SolrException: splitshard the collection time out:300s at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:204) at org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:422) at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:158) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:662) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) /strint name=code500/int/lst /response During the run time I'm not convinced it's doing anything. No split directory is created, CPU remains very low and memory doesn't seem to spike. Any help would be greatly appreciated. -- Annette Newton Database Administrator ServiceTick Ltd T:+44(0)1603 618326 Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ www.servicetick.com *www.sessioncam.com http://www.sessioncam.com* -- *This message is confidential and is intended to be read solely by the addressee. The contents should not be disclosed to any other person or copies taken unless authorised to do so. If you are not the intended recipient, please notify the sender and permanently delete this message. As Internet communications are not secure ServiceTick accepts neither legal responsibility for the contents of this message nor responsibility for any change made to this message after it was forwarded by the original author.*
ANNOUNCE: Apache Solr Reference Guide 4.6
The Lucene PMC is pleased to announce the release of the Apache Solr Reference Guide for Solr 4.6. This 347 page PDF serves as the definitive users manual for Solr 4.6. The Solr Reference Guide is available for download from the Apache mirror network: https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/ (If you have followup questions, please send them only to solr-user@lucene.apache.org) -Hoss
Re: Function query matching
I'm persuing this possible PostFilter solution, I can see how to collect all the hits and recompute the scores in a PostFilter, after all the hits have been collected (for scaling). Now, I can't see how to get the custom doc/score values back into the main query's HitQueue. Any advice? Thanks, Peter On Fri, Nov 29, 2013 at 9:18 AM, Peter Keegan peterlkee...@gmail.comwrote: Instead of using a function query, could I use the edismax query (plus some low cost filters not shown in the example) and implement the scale/sum/product computation in a PostFilter? Is the query's maxScore available there? Thanks, Peter On Wed, Nov 27, 2013 at 1:58 PM, Peter Keegan peterlkee...@gmail.comwrote: Although the 'scale' is a big part of it, here's a closer breakdown. Here are 4 queries with increasing functions, and theei response times (caching turned off in solrconfig): 100 msec: select?q={!edismax v='news' qf='title^2 body'} 135 msec: select?qq={!edismax v='news' qf='title^2 body'}q={!func}product(field(myfield),query($qq)fq={!query v=$qq} 200 msec: select?qq={!edismax v='news' qf='title^2 body'}q={!func}sum(product(0.75,query($qq)),product(0.25,field(myfieldfq={!query v=$qq} 320 msec: select?qq={!edismax v='news' qf='title^2 body'}scaledQ=scale(product(query($qq),1),0,1)q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))fq={!query v=$qq} Btw, that no-op product is necessary, else you get this exception: org.apache.lucene.search.BooleanQuery$BooleanWeight cannot be cast to org.apache.lucene.queries.function.valuesource.ScaleFloatFunction$ScaleInfo thanks, peter On Wed, Nov 27, 2013 at 1:30 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : So, this query does just what I want, but it's typically 3 times slower : than the edismax query without the functions: that's because the scale() function is inhernetly slow (it has to compute the min max value for every document in order to know how to scale them) what you are seeing is the price you have to pay to get that query with a normalized 0-1 value. (you might be able to save a little bit of time by eliminating that no-Op multiply by 1: product(query($qq),1) ... but i doubt you'll even notice much of a chnage given that scale function. : Is there any way to speed this up? Would writing a custom function query : that compiled all the function queries together be any faster? If you can find a faster implementation for scale() then by all means let us konw, and we can fold it back into Solr. -Hoss
Re: SolrCloud FunctionQuery inconsistency
: However, when sort by ptime desc, the result is consistent. : The dateDeboost generate the time-weight from ptime, which is multiplied by : the score. As Erick mentioned, you haven't given us enough details to make any educated guesses as to what problem you are seeing. My wild, uneducated, shot in the dark guess: are you populating ptime using a default of NOW? If so, can you rule out the function as an issue by asking for fl=id,ptime and confirming that the ptime for these documents sometimes varies slightly? NOTE: Allthough it is possible to configure a TrieDateField instance with a default value of NOW to compute a timestamp of when the document was indexed, this is not advisable when using SolrCloud since each replica of the document may compute a slightly different value. TimestampUpdateProcessorFactory is recomended instead. https://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/schema/TrieDateField.html https://lucene.apache.org/solr/4_6_0/solr-core/org/apache/solr/update/processor/TimestampUpdateProcessorFactory.html -Hoss
Re: solr as a service for multiple projects in the same environment
I think that one experience in this area could by provided by Tray Grainger, author of Solr in Action, I believe that some of his work on careerbuilder involve the creation of something (somehow) similar to what you're trying to accomplish. I must say that I'm also interested in this topic, but haven't had the time to really do anything about this. - Mensaje original - De: adfel70 adfe...@gmail.com Para: solr-user@lucene.apache.org Enviados: Domingo, 1 de Diciembre 2013 2:41:00 Asunto: Re: solr as a service for multiple projects in the same environment The risk is if you buy mistake mess up a cluster while doing maintenance on one of the systems, you can affect the other system. Its a pretty amorfic risk. Aside from having multiple systems share the same hardware resources, I don't see any other real risk. Are your collections share the same topology in terms of shards and replicas? Do you manually configure the nodes on which each collection is created so that you'll still have some level of seperation between the systems? michael.boom wrote Hi, There's nothing unusual in what you are trying to do, this scenario is very common. To answer your questions: 1. as I understand I can separate the configs of each collection in zookeeper. is it correct? Yes, that's correct. You'll have to upload your configs to ZK and use the CollectionAPI to create your collections. 2.are there any solr operations that can be performed on collection A and somehow affect collection B? No, I can't think of any cross-collection operation. Here you can find a list of collection related operations: https://cwiki.apache.org/confluence/display/solr/Collections+API 3. is the solr cache separated for each collection? Yes, separate and configurable in solrconfig.xml for each collection. 4. I assume that I'll encounter a problem with the os cache, when the different indices will compete on the same memory, right? how severe is this issue? Hardware can be a bottleneck. If all your collection will face the same load you should try to give solr a RAM amount equal to the index size (all indexes) 5. any other advice on building such an architecture? does the maintenance overhead of maintaining multiple clusters in production really overwhelm the problems and risks of using the same cluster for multiple systems? I was in the same situation as you, and putting everything in multiple collections in just one cluster made sense for me : it's easier to manage and has no obvious downside. As for risks of using the same cluster for multiple systems they are pretty much the same in both scenarios. Only that with multiple clusters you'll have much more machines to manage. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-as-a-service-for-multiple-projects-in-the-same-environment-tp4103523p4104206.html Sent from the Solr - User mailing list archive at Nabble.com. III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Proxy.php tutorials for AJAX Solr
Are there any good tutorials that touch base on how to integrate the suggested PHP proxy for JavaScript framework AJAX Solr? Here is the proxy, https://gist.github.com/evolvingweb/298580 Also on Stackoverflow, http://stackoverflow.com/questions/20338073/proxy-php-tutorials-for-ajax-solr IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Re: Error integrating opennlp in solr
Did you check here: http://wiki.apache.org/solr/OpenNLP 30 Kasım 2013 Cumartesi tarihinde Arti a...@j9ventures.com adlı kullanıcı şöyle yazdı: Hi Team , I am getting the stack of errors given below while integrating solr with OpenNLP. Please help. Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text_opennlp: Plugin init failure for [schema.xml] a nalyzer/tokenizer: Error loading class 'solr.OpenNLPTokenizerFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467) ... 15 more Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/tokenizer: Error loading class 'solr.OpenNLPTokenizerFa ctory' With Regards, Arti Lamba - Dainik Jagran - Largest Read Daily of India with 56.5 Million Readers. (Source: Indian Readership Survey 2012 Q4) www.jagran.com www.jplcorp.in www.adrates.jagran.com
Re: Error integrating opennlp in solr
Especially here: Also, you may have to add the OpenNLP lib directory to your solr/lib or solr/cores/collection/lib directory. The text types assume that cores/collection/conf/opennlp contains the OpenNLP model files. 3 Aralık 2013 Salı tarihinde Furkan KAMACI furkankam...@gmail.com adlı kullanıcı şöyle yazdı: Did you check here: http://wiki.apache.org/solr/OpenNLP 30 Kasım 2013 Cumartesi tarihinde Arti a...@j9ventures.com adlı kullanıcı şöyle yazdı: Hi Team , I am getting the stack of errors given below while integrating solr with OpenNLP. Please help. Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text_opennlp: Plugin init failure for [schema.xml] a nalyzer/tokenizer: Error loading class 'solr.OpenNLPTokenizerFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467) ... 15 more Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/tokenizer: Error loading class 'solr.OpenNLPTokenizerFa ctory' With Regards, Arti Lamba - Dainik Jagran - Largest Read Daily of India with 56.5 Million Readers. (Source: Indian Readership Survey 2012 Q4) www.jagran.com www.jplcorp.in www.adrates.jagran.com
Re: SolrCloud FunctionQuery inconsistency
Thanks, Erick I mean the first id of the results is not consistent, and the maxScore is not too. When query, I do index docs at the same time, but they are not revelent to this query. The updated docs can not affect tf cals, and for idf, they should affect for all docs, so the results should consistent. But for the same query, it shows diffenents sort(either sort A or sort B) over and over. Thanks, sling -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346p4104549.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud FunctionQuery inconsistency
Thank for your reply, Chris. Yes, I am populating ptime using a default of NOW. I only store the id, so I can't get ptime values. But from the perspective of business logic, ptime should not change. Strangely, the sort result is consistent now... :( I should do more test case... -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-FunctionQuery-inconsistency-tp4104346p4104558.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Function query matching
We're working on the same problem with the combination of the scale(query(...)) combination, so I'd like to share a bit more information that may be useful. *On the scale function:* Even thought the scale query has to calculate the scores for all documents, it is actually doing this work twice for each ValueSource (once to calculate the min and max values, and then again when actually scoring the documents), which is inefficient. To solve the problem, we're in the process of putting a cache inside the scale function to remember the values for each document when they are initially computed (to find the min and max) so that the second pass can just use the previously computed values for each document. Our theory is that most of the extra time due to the scale function is really just the result of doing duplicate work. No promises this won't be overly costly in terms of memory utilization, but we'll see what we get in terms of speed improvements and will share the code if it works out well. Alternate implementation suggestions (or criticism of a cache like this) are also welcomed. *On the NoOp product function: scale(prod(1, query(...))):* We do the same thing, which ultimately is just an unnecessary waste of a loop through all documents to do an extra multiplication step. I just debugged the code and uncovered the problem. There is a Map (called context) that is passed through to each value source to store intermediate state, and both the query and scale functions are passing the ValueSource for the query function in as the KEY to this Map (as opposed to using some composite key that makes sense in the current context). Essentially, these lines are overwriting each other: Inside ScaleFloatFunction: context.put(this.source, scaleInfo); //this.source refers to the QueryValueSource, and the scaleInfo refers to a ScaleInfo object Inside QueryValueSource: context.put(this, w); //this refers to the same QueryValueSource from above, and the w refers to a Weight object As such, when the ScaleFloatFunction later goes to read the ScaleInfo from the context Map, it unexpectedly pulls the Weight object out instead and thus the invalid case exception occurs. The NoOp multiplication works because it puts an different ValueSource between the query and the ScaleFloatFunction such that this.source (in ScaleFloatFunction) != this (in QueryValueSource). This should be an easy fix. I'll create a JIRA ticket to use better key names in these functions and push up a patch. This will eliminate the need for the extra NoOp function. -Trey On Mon, Dec 2, 2013 at 12:41 PM, Peter Keegan peterlkee...@gmail.comwrote: I'm persuing this possible PostFilter solution, I can see how to collect all the hits and recompute the scores in a PostFilter, after all the hits have been collected (for scaling). Now, I can't see how to get the custom doc/score values back into the main query's HitQueue. Any advice? Thanks, Peter On Fri, Nov 29, 2013 at 9:18 AM, Peter Keegan peterlkee...@gmail.com wrote: Instead of using a function query, could I use the edismax query (plus some low cost filters not shown in the example) and implement the scale/sum/product computation in a PostFilter? Is the query's maxScore available there? Thanks, Peter On Wed, Nov 27, 2013 at 1:58 PM, Peter Keegan peterlkee...@gmail.com wrote: Although the 'scale' is a big part of it, here's a closer breakdown. Here are 4 queries with increasing functions, and theei response times (caching turned off in solrconfig): 100 msec: select?q={!edismax v='news' qf='title^2 body'} 135 msec: select?qq={!edismax v='news' qf='title^2 body'}q={!func}product(field(myfield),query($qq)fq={!query v=$qq} 200 msec: select?qq={!edismax v='news' qf='title^2 body'}q={!func}sum(product(0.75,query($qq)),product(0.25,field(myfieldfq={!query v=$qq} 320 msec: select?qq={!edismax v='news' qf='title^2 body'}scaledQ=scale(product(query($qq),1),0,1)q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))fq={!query v=$qq} Btw, that no-op product is necessary, else you get this exception: org.apache.lucene.search.BooleanQuery$BooleanWeight cannot be cast to org.apache.lucene.queries.function.valuesource.ScaleFloatFunction$ScaleInfo thanks, peter On Wed, Nov 27, 2013 at 1:30 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : So, this query does just what I want, but it's typically 3 times slower : than the edismax query without the functions: that's because the scale() function is inhernetly slow (it has to compute the min max value for every document in order to know how to scale them) what you are seeing is the price you have to pay to get that query with a normalized 0-1 value. (you might be able to save a little bit of time by eliminating that no-Op multiply by 1: product(query($qq),1) ... but i doubt you'll even notice much of a chnage
Re: Auto optimized of Solr indexing results
Thanks Erick for your advance and share. Regards, On Mon, Dec 2, 2013 at 11:06 PM, Erick Erickson erickerick...@gmail.comwrote: TieredMergePolicy is the default even though it's commented out in solrconfig, it's still being used. So there's nothing to do. Given the size of your index, you can actually do whatever you please. Optimizing it will shrink its size, but frankly your index is so small I doubt you'll see any noticeable difference. They'll self-purge as you re-crawl eventually. In all, I think you can mostly ignore the issue. Best, Erick On Sun, Dec 1, 2013 at 8:00 PM, Bayu Widyasanyata bwidyasany...@gmail.comwrote: Hi Erick, After waiting for some days abt. a week (I did daily crawling indexing), here are the docs summary: Num Docs: 9738 Max Doc: 15311 Deleted Docs: 5573 Version: 781 Segment Count: 5 The percentage of deletedDocs of NumDocs is near 57%. In the other, the TieredMergePolicy in solrconfig.xml is still disabled. !-- mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce10/int int name=segmentsPerTier10/int /mergePolicy -- Should we enable it and wait for the effect? Thanks! On Wed, Nov 20, 2013 at 9:55 PM, Bayu Widyasanyata bwidyasany...@gmail.comwrote: Thanks Erick. I will check that on next round. --- wassalam, [bayu] /sent from Android phone/ On Nov 20, 2013 7:45 PM, Erick Erickson erickerick...@gmail.com wrote: You probably shouldn't optimize at all. The default TieredMergePolicy will eventually purge the deleted files' data, which is really what optimize does. So despite its name, most of the time it's not really worth the effort. Take a look at your Solr admin page, the overview link for a core. If the number of deleted docs is a significant percentage of your numDocs (I typically use 20% or so, but YMMV) then optimize might be worthwhile. Otherwise, it's a distraction unless and until you have some evidence that it actually makes a difference. Best, Erick On Wed, Nov 20, 2013 at 7:33 AM, Bayu Widyasanyata bwidyasany...@gmail.comwrote: Hi, After successfully configured re-crawling script, I sometimes checked and found on Solr Admin that Optimized status of my collection is not optimized (slash icon). Hence I did optimized steps manually. How to make my crawling optimized automatically? Should we restart Solr (I use Tomcat) as shown on here [1] [1] http://wiki.apache.org/nutch/Crawl Thanks! -- wassalam, [bayu] -- wassalam, [bayu] -- wassalam, [bayu]
Re: Constantly increasing time of full data import
Michal, I don't have much experience with DIH so I'll leave that to someone else but I would suggest you profile Solr during imports. That might show you where the bottleneck is. Generally, it's reasonable to think Solr updates will get slower the larger the indexes get and the more load you put on the system. It's possible you're seeing something outside the norm - I just don't know what you were expecting and the capabilities of your resources. You might want to post more info (autoCommit settings, etc) as well. Thanks, Ryan On Mon, Dec 2, 2013 at 4:22 AM, michallos michal.ware...@gmail.com wrote: Update: I can see that times increases when the search load is higher. During nights and weekends full load times doesn't increase. So it is not caused by the number of documents being loaded (during weekends we have the same number of new documents) but number of queries / minute. Anyone observe such strange behaviour? It is critical for us. Best, Michal -- View this message in context: http://lucene.472066.n3.nabble.com/Constantly-increasing-time-of-full-data-import-tp4103873p4104370.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing Multiple Languages with solr (Arabic English)
Hi, I am working on solr for using searching by indexing with text_general for ENGLISH language. Search is working fine. Now I have a Arabic text, which needs to indexing and searching. Below is my basic config for English.* Same field contains ENGLISH and ARABIC text in database*. Please guide me in this. fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I saw below configs in schema.xml file for Arabic language. fieldType name=text_ar class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_ar.txt enablePositionIncrements=true/ filter class=solr.ArabicNormalizationFilterFactory/ filter class=solr.ArabicStemFilterFactory/ /analyzer /fieldType Please suggest me to configure Arabic indexing and searching. Thanks in Advance, AnilJayanti -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-Multiple-Languages-with-solr-Arabic-English-tp4104580.html Sent from the Solr - User mailing list archive at Nabble.com.
Using the flexible query parser in Solr instead of classic
Hi folks, last year we built a 3.X Solr-QueryParser based on org.apache.lucene.queryparser.flexible.standard.StandardQueryParser because we had some additions with SpanQueries and PhraseQueries. We think about to adapt this for 4.X At time the SolrQueryParser is based on org.apache.lucene.queryparser.classic.QueryParser.jj Is there a plan for 4.X to switch with LuceneQParser from classic to flexible ( org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser.jj )? Is there a SOLR-Task to use the flexible QP ? Is this a need for someone else? Beste regards Karsten P.S. I did only found one (unanswered) Thread and no Task about Solr and flexible QP (Thread: http://lucene.472066.n3.nabble.com/Using-the-contrib-flexible-query-parser-in-Solr-td819.html ) -- View this message in context: http://lucene.472066.n3.nabble.com/Using-the-flexible-query-parser-in-Solr-instead-of-classic-tp4104584.html Sent from the Solr - User mailing list archive at Nabble.com.
post filtering for boolean filter queries
Hello! We have been experimenting with post filtering lately. Our setup is a filter having long boolean query; drawing the example from the Dublin's Stump the Chump: fq=UserId:(user1 OR user2 OR...OR user1000) The underlining issue impacting performance is that the combination of user ids in the query above is unique per each user in the system and on top the combination is changing every day. Our idea was to stop caching the filter query with {!cache=false}. Since there is no way to introspect the contents of the filter cache to our knowledge (jmx?), we can't be sure those are not cached. This is because the initial query per each combination takes substantially more time (as if it was *not* cached) than the second and subsequent queries with the same fq (as if it *was* cached). Question is: does post filtering support boolean queries in fq params? Another thing we have been trying is assigning a cost to the fq relatively higher than for other filter queries. Does this feature support the boolean queries in fq params as well? -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan
Re: Best approach to multiple languages
Hi thanks for you post. I am searching for this type of multiple language indexing and searching in solr. Below is my post in lecene. Can you please help me out of this. http://lucene.472066.n3.nabble.com/Indexing-Multiple-Languages-with-solr-Arabic-amp-English-td4104580.html thanks in advance, aniljayanti -- View this message in context: http://lucene.472066.n3.nabble.com/Best-approach-to-multiple-languages-tp498198p4104593.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ANNOUNCE: Apache Solr Reference Guide 4.6
But it still has the error about TrimFilterFactory in it, which I reported a couple of days back. http://www.mail-archive.com/solr-user@lucene.apache.org/msg92064.html So what it needs to correct the Reference Guide is to place a note like under StopFilter somewhere under TrimFilter: As of Solr 4.4, the updateOffsets argument is no longer supported. By the way, I found the solution to my question by looking into the sources. Thanks anyway. Bernd Am 02.12.2013 18:28, schrieb Chris Hostetter: The Lucene PMC is pleased to announce the release of the Apache Solr Reference Guide for Solr 4.6. This 347 page PDF serves as the definitive users manual for Solr 4.6. The Solr Reference Guide is available for download from the Apache mirror network: https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/ (If you have followup questions, please send them only to solr-user@lucene.apache.org) -Hoss