Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository
Does this work ? I can suggest -XX:-UseLoopPredicate to switch off predicates. ??? Which version of 7 is recommended ? Bill Bell Sent from mobile On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org wrote: *Don't* use JDK 7u40, it's been known to cause index corruption and SIGSEGV faults with Lucene: LUCENE-5212 This has not been unnoticed by Oracle. ~ David On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote: 2. Java version: There are huges performance winning between Java 5, 6 and 7; we use Oracle JDK 7u40.
Re: SolrCore 'collection1' is not available due to init failure
org.apache.solr.core.SolrCore.init(SolrCore.java:821) ... 13 more Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/usr/share/solr-4.5.0/example/solr/ collection1/data/index/write.lock: java.io.FileNotFoundException: /usr/share/solr-4.5.0/example/solr/collection1/data/index/write.lock (Permission denied) at org.apache.lucene.store.Lock.obtain(Lock.java:84) at it seems a permission problem, the user that start tomcat don't have permission to access your index folder. try grant read and write permission to current user to your solr data folder and restart tomcat to see what happens. -- All the best Liu Bo
Re: Questions developing custom functionquery
Hello JT, what's is the field and fieldType definition for resname ? can't you check how '/some example/data/here/2013/09/12/ testing.text ' is handled on analysis page in SolrAdmin? On Fri, Oct 11, 2013 at 4:53 AM, Richard Lee rockiee...@gmail.com wrote: seems what u got is the terms other than the raw data. maybe u should check the api docs for more details 2013-10-11 上午3:56于 JT handyrems...@gmail.com写道: I'm running into some issues developing a custom functionquery. My goal is to be able to implement a custom sorting technique. I have a field defined called resname, it is a single value str. Example: str name=resname/some example/data/here/2013/09/12/testing.text/str I would like to do a custom sort based on this resname field. Basically, I would like to parse out that date there (2013/09/12) and sort on that date. I've followed various tutorials - http://java.dzone.com/news/how-write-custom-solr - http://www.supermind.org/blog/756/how-to-write-a-custom-solr-functionquery Im at the point where my code compiles, runs, executes, etc. Solr is happy with my code. I have classes that inherit from ValueSorceParser and ValueSorce, etc. I've overrode parse and instantiated my class with ValueSource public ValueSource parse(FunctionQParser fqp) { return MyCustomClass(fqp.parseValueSource) } public class MyCustomClass extends ValueSource { ValueSource source; public MyCustomClass(ValueSource source) { this.source = source; } public FunctionValues getValues() { final FunctionValues sourceDV = source.getvalues(context,readerContext) return new IntValues(this) public int intVal(int doc) { //parse the value of resname here String value = sourceDV.strVal(doc); ...more stuff } } } The issue I'm running into is that my call to sourceDV.strVal(doc) only returns part of the field, not all of it. It appears to be very random. I guess my actual question is, how do I access / reference the EXACT RAW value of a field, while writing a functionquery. Do I need to change my ValueSource to a String?, then somehow lookup the field name while inside my getValues call? Is there a way to access the raw field data , when referencing it as a FunctionValues? Maybe I'm going about this totally incorrectly? -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Multiple schemas in the same SolrCloud ?
Thanks! My only doubt is: upload a new set of configuration files to the same configuration name like so: Initial configuration: zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir conf_initial/ -confname my_custom_config and afterwards, to change it do: zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir conf_changed/ -confname my_custom_config Is this correct? If so, what happens afterwards, will ZK distribute this changes to all cores and reload them? -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279p4094895.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple schemas in the same SolrCloud ?
Here is a topic you should read it: http://lucene.472066.n3.nabble.com/Reloading-config-to-zookeeper-td4021901.html 2013/10/11 maephisto my_sky...@yahoo.com Thanks! My only doubt is: upload a new set of configuration files to the same configuration name like so: Initial configuration: zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir conf_initial/ -confname my_custom_config and afterwards, to change it do: zkcli.sh -zkhost localhost:9983 -cmd upconfig -confdir conf_changed/ -confname my_custom_config Is this correct? If so, what happens afterwards, will ZK distribute this changes to all cores and reload them? -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279p4094895.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Cloud Basic Authentification
I've deployed a SolrCloud cluster in Jetty 9 using solr 4.4.0 and I would like to add some basic authentification. My question is how can I provide the credentials so that they're used in the collection API when creating a new collection or by ZK? Are there any useful docs/wiki on this topic? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Cloud Basic Authentification
For pre 4.x Solr (aka Solr 3.x) basic authentication works fine. Check this site: http://wiki.apache.org/solr/SolrSecurity Even master-slave replication architecture (*not* SolrCloud) works for me. There could be some problems with *cross-shard* queries etc. though (see SOLR-1861, SOLR-3421). I know I haven't answered your question but hopefully I have given you some more information on the subject. Best regards, Primož From: maephisto my_sky...@yahoo.com To: solr-user@lucene.apache.org Date: 11.10.2013 10:55 Subject:Solr Cloud Basic Authentification I've deployed a SolrCloud cluster in Jetty 9 using solr 4.4.0 and I would like to add some basic authentification. My question is how can I provide the credentials so that they're used in the collection API when creating a new collection or by ZK? Are there any useful docs/wiki on this topic? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Please help!, Highlighting exact phrases with solr
Dear Koji, Thanks a lot for your answer and Sorry about my english I tried to configure FastVectorHighlighterhttp://wiki.apache.org/solr/HighlightingParameters#hl.useFastVectorHighlighter However, I have this error: lst name=error str name=msg fragCharSize(1) is too small. It must be 18 or higher. /str str name=trace java.lang.IllegalArgumentException: fragCharSize(1) is too small. It must be 18 or higher. at org.apache.lucene.search.vectorhighlight.BaseFragListBuilder.createFieldFragList(BaseFragListBuilder.java:51) at org.apache.lucene.search.vectorhighlight.WeightedFragListBuilder.createFieldFragList(WeightedFragListBuilder.java:38) at org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getFieldFragList(FastVectorHighlighter.java:195) at org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBestFragments(FastVectorHighlighter.java:184) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:588) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:413) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:139) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:365) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) /str int name=code500/int /lst /response Then, If I modify like this: (setHighlightFragsize(1) -- setHighlightFragsize(80)): SolrQuery solrQuery = new SolrQuery(); solrQuery.setQuery(queryEnt); solrQuery.set(collectionName, myCollection); solrQuery.addHighlightField(texto) .addHighlightField(titular) .setHighlightSnippets(50) .setHighlightFragsize(80); solrQuery.setHighlight(true); solrQuery.setHighlightRequireFieldMatch(true); solrQuery.set(hl.useFastVectorHighlighter, true); solrQuery.setHighlightSimplePre(span class=\item\); solrQuery.setHighlightSimplePost(/span); solrQuery.set(hl.usePhraseHighlighter, true); Then, It works (error disappears), but Highlighting does not work :( : lst name=highlighting lst name=35254502/ lst name=35237409/ /lst lst name=termVectors str name=uniqueKeyFieldNamec_noticia/str lst name=warnings arr name=noTermVectors strc_region/str strc_idioma/str strc_pais/str strc_tipo/str strc_categoria/str strfecha_captura/str strmedio/str strc_fuente_docu/str /arr /lst lst name=35254502 str name=uniqueKey35254502/str /lst lst name=35237409 str
Re: Multiple schemas in the same SolrCloud ?
Hi,kamaci. Is that means I just need to upload new config files ,and do not need to reload every node in solrCloud ,when I want to change my configurations? -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279p4094908.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Cloud Basic Authentification
Here is more information about security that you can use: http://wiki.apache.org/solr/SolrSecurity 2013/10/11 maephisto my_sky...@yahoo.com I've deployed a SolrCloud cluster in Jetty 9 using solr 4.4.0 and I would like to add some basic authentification. My question is how can I provide the credentials so that they're used in the collection API when creating a new collection or by ZK? Are there any useful docs/wiki on this topic? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Cloud Basic Authentification
Thank you! I'm more interested in the SolrCloud architecture, with shards, shards replicas and distributed index and search. This are the features i use and would like to protect by some basic authentification. I imagine that there must be a way to have this, otherwise anybody could mess with or even drop my collection. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903p4094911.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Please help!, Highlighting exact phrases with solr
Here is a similar question: http://search-lucene.com/m/vnMGKACGM1/%252218+or+higher.%2522subj=FastVectorHighlighter+and+hl+fragsize+parameter+set+to+zero+causes+exception and a related fixed issue: https://issues.apache.org/jira/browse/SOLR-1268 2013/10/11 Silvia Suárez s...@anpro21.com Dear Koji, Thanks a lot for your answer and Sorry about my english I tried to configure FastVectorHighlighter http://wiki.apache.org/solr/HighlightingParameters#hl.useFastVectorHighlighter However, I have this error: lst name=error str name=msg fragCharSize(1) is too small. It must be 18 or higher. /str str name=trace java.lang.IllegalArgumentException: fragCharSize(1) is too small. It must be 18 or higher. at org.apache.lucene.search.vectorhighlight.BaseFragListBuilder.createFieldFragList(BaseFragListBuilder.java:51) at org.apache.lucene.search.vectorhighlight.WeightedFragListBuilder.createFieldFragList(WeightedFragListBuilder.java:38) at org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getFieldFragList(FastVectorHighlighter.java:195) at org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBestFragments(FastVectorHighlighter.java:184) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:588) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:413) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:139) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:365) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) /str int name=code500/int /lst /response Then, If I modify like this: (setHighlightFragsize(1) -- setHighlightFragsize(80)): SolrQuery solrQuery = new SolrQuery(); solrQuery.setQuery(queryEnt); solrQuery.set(collectionName, myCollection); solrQuery.addHighlightField(texto) .addHighlightField(titular) .setHighlightSnippets(50) .setHighlightFragsize(80); solrQuery.setHighlight(true); solrQuery.setHighlightRequireFieldMatch(true); solrQuery.set(hl.useFastVectorHighlighter, true); solrQuery.setHighlightSimplePre(span class=\item\); solrQuery.setHighlightSimplePost(/span); solrQuery.set(hl.usePhraseHighlighter, true); Then, It works (error disappears),
Re: Solr Cloud Basic Authentification
One possible solution is to firewall access to SolrCloud server(s). Only proxy/load-balacing servers should have unrestricted access to Solr infrastructure. Then you can implement basic/advanced authentication on the proxy/LB side. Primož From: maephisto my_sky...@yahoo.com To: solr-user@lucene.apache.org Date: 11.10.2013 11:17 Subject:Re: Solr Cloud Basic Authentification Thank you! I'm more interested in the SolrCloud architecture, with shards, shards replicas and distributed index and search. This are the features i use and would like to protect by some basic authentification. I imagine that there must be a way to have this, otherwise anybody could mess with or even drop my collection. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903p4094911.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Cloud Basic Authentification
Thank you, But I'm afraid that wiki page does not cover my topic of interest -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903p4094915.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Cloud Basic Authentification
If you want to deploy basic authentication in a way that a login is required when creating collections it is only a simple matter of constrainting a url pattern (eg. /solr/admin/collections/*). Maybe this link will help: http://stackoverflow.com/questions/5323855/jetty-webserver-security/5332049#5332049 But keep in mind that intra-node requests in SolrCloud must also be authenticated (because http stack is used). If I understand correctly this is currently not possible. Primož From: maephisto my_sky...@yahoo.com To: solr-user@lucene.apache.org Date: 11.10.2013 11:25 Subject:Re: Solr Cloud Basic Authentification Thank you, But I'm afraid that wiki page does not cover my topic of interest -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903p4094915.html Sent from the Solr - User mailing list archive at Nabble.com.
Cores with lot of folders with prefix index.XXXXXXX
Hi, I have some cores with lot of folder with format index.X, my question is why? The collateral effect of this are shards with 50% of size than replicas in other nodes. There is any way to delete this folders to free space? It's a bug? /Yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Cores-with-lot-of-folders-with-prefix-index-XXX-tp4094920.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Cores with lot of folders with prefix index.XXXXXXX
I think this is connected to replications being made? I also have quite some of them but currently I am not worried :) Primož From: yriveiro yago.rive...@gmail.com To: solr-user@lucene.apache.org Date: 11.10.2013 11:54 Subject:Cores with lot of folders with prefix index.XXX Hi, I have some cores with lot of folder with format index.X, my question is why? The collateral effect of this are shards with 50% of size than replicas in other nodes. There is any way to delete this folders to free space? It's a bug? /Yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Cores-with-lot-of-folders-with-prefix-index-XXX-tp4094920.html Sent from the Solr - User mailing list archive at Nabble.com.
solrnet sample
i want to change the schema file of solrnet sample and want to add xml file and want to facet data so what i have to need to do in sample file??? Regards, Kishan Parmar Software Developer +91 95 100 77394 Jay Shree Krishnaa !!
Re: Cores with lot of folders with prefix index.XXXXXXX
I have ssd's therefor my space is like gold, I can have 30% of my space waste in failed replications, or replications that are not cleaned. The question for me is if this a normal behaviour or is a bug. If is a normal behaviour I have a trouble because a ssd with more than 512G is expensive. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si wrote: I think this is connected to replications being made? I also have quite some of them but currently I am not worried :)
Re: Please help!, Highlighting exact phrases with solr
Hi, Thanks for your answer Furkan, I'm sorry, I don't understand the proposed solution... I did this: 1. eliminate hl.useHighlighter parameter 2. introduce hl.useFastVectorHighlighter However the result is the same... is something missing? Thanks a lot in advance for your help... Sil. * * *Tecnologías y SaaS para el análisis de marcas comerciales.* Nota: Usted ha recibido este mensaje al estar en la libreta de direcciones del remitente, en los archivos de la empresa o mediante el sistema de “responder” al ser usted la persona que contactó por este medio con el remitente. En caso de no querer recibir ningún email mas del remitente o de cualquier miembro de la organización a la que pertenece, por favor, responda a este email solicitando la baja de su dirección en nuestros archivos. Advertencia legal: Este mensaje y, en su caso, los ficheros anexos son confidenciales, especialmente en lo que respecta a los datos personales, y se dirigen exclusivamente al destinatario referenciado. Si usted no lo es y lo ha recibido por error o tiene conocimiento del mismo por cualquier motivo, le rogamos que nos lo comunique por este medio y proceda a destruirlo o borrarlo, y que en todo caso se abstenga de utilizar, reproducir, alterar, archivar o comunicar a terceros el presente mensaje y ficheros anexos, todo ello bajo pena de incurrir en responsabilidades legales. 2013/10/11 Furkan KAMACI furkankam...@gmail.com Here is a similar question: http://search-lucene.com/m/vnMGKACGM1/%252218+or+higher.%2522subj=FastVectorHighlighter+and+hl+fragsize+parameter+set+to+zero+causes+exception and a related fixed issue: https://issues.apache.org/jira/browse/SOLR-1268 2013/10/11 Silvia Suárez s...@anpro21.com Dear Koji, Thanks a lot for your answer and Sorry about my english I tried to configure FastVectorHighlighter http://wiki.apache.org/solr/HighlightingParameters#hl.useFastVectorHighlighter However, I have this error: lst name=error str name=msg fragCharSize(1) is too small. It must be 18 or higher. /str str name=trace java.lang.IllegalArgumentException: fragCharSize(1) is too small. It must be 18 or higher. at org.apache.lucene.search.vectorhighlight.BaseFragListBuilder.createFieldFragList(BaseFragListBuilder.java:51) at org.apache.lucene.search.vectorhighlight.WeightedFragListBuilder.createFieldFragList(WeightedFragListBuilder.java:38) at org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getFieldFragList(FastVectorHighlighter.java:195) at org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBestFragments(FastVectorHighlighter.java:184) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByFastVectorHighlighter(DefaultSolrHighlighter.java:588) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:413) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:139) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:365) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) at
Re: Cores with lot of folders with prefix index.XXXXXXX
Do you have a lot of failed replications? Maybe those folders have something to do with this (please see the last answer at http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing ). If your disk space is valuable check index.properties file under data folder and try to determine which folders can be safely deleted. Primož From: Yago Riveiro yago.rive...@gmail.com To: solr-user@lucene.apache.org Date: 11.10.2013 12:13 Subject:Re: Cores with lot of folders with prefix index.XXX I have ssd's therefor my space is like gold, I can have 30% of my space waste in failed replications, or replications that are not cleaned. The question for me is if this a normal behaviour or is a bug. If is a normal behaviour I have a trouble because a ssd with more than 512G is expensive. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si wrote: I think this is connected to replications being made? I also have quite some of them but currently I am not worried :)
Re: Cores with lot of folders with prefix index.XXXXXXX
The thread that you point is about master / slave - replication, Is this issue valid on SolrCloud context? I check the index.properties and indeed the variable index=index.X point to a folder, the others can be deleted without any scary side effect? -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, October 11, 2013 at 11:31 AM, primoz.sk...@policija.si wrote: Do you have a lot of failed replications? Maybe those folders have something to do with this (please see the last answer at http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing ). If your disk space is valuable check index.properties file under data folder and try to determine which folders can be safely deleted. Primo¾ From: Yago Riveiro yago.rive...@gmail.com (mailto:yago.rive...@gmail.com) To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org) Date: 11.10.2013 12:13 Subject: Re: Cores with lot of folders with prefix index.XXX I have ssd's therefor my space is like gold, I can have 30% of my space waste in failed replications, or replications that are not cleaned. The question for me is if this a normal behaviour or is a bug. If is a normal behaviour I have a trouble because a ssd with more than 512G is expensive. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si (mailto:primoz.sk...@policija.si) wrote: I think this is connected to replications being made? I also have quite some of them but currently I am not worried :)
Re: Cores with lot of folders with prefix index.XXXXXXX
There are open issues related to extra index.XXX folders lying around if replication/recovery fails. See https://issues.apache.org/jira/browse/SOLR-4506 On Fri, Oct 11, 2013 at 4:06 PM, Yago Riveiro yago.rive...@gmail.comwrote: The thread that you point is about master / slave - replication, Is this issue valid on SolrCloud context? I check the index.properties and indeed the variable index=index.X point to a folder, the others can be deleted without any scary side effect? -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, October 11, 2013 at 11:31 AM, primoz.sk...@policija.si wrote: Do you have a lot of failed replications? Maybe those folders have something to do with this (please see the last answer at http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing ). If your disk space is valuable check index.properties file under data folder and try to determine which folders can be safely deleted. Primo¾ From: Yago Riveiro yago.rive...@gmail.com (mailto: yago.rive...@gmail.com) To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org) Date: 11.10.2013 12:13 Subject: Re: Cores with lot of folders with prefix index.XXX I have ssd's therefor my space is like gold, I can have 30% of my space waste in failed replications, or replications that are not cleaned. The question for me is if this a normal behaviour or is a bug. If is a normal behaviour I have a trouble because a ssd with more than 512G is expensive. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si(mailto: primoz.sk...@policija.si) wrote: I think this is connected to replications being made? I also have quite some of them but currently I am not worried :) -- Regards, Shalin Shekhar Mangar.
Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository
Not so hard switching it to Oracle JDK 7u40. Just download it and change the JAVA_HOME path in /etc/default/jetty, so it's not nescessary to switch java version with update-java-alternatives The machine is 64bit :) 2013/10/11 Bill Bell billnb...@gmail.com Does this work ? I can suggest -XX:-UseLoopPredicate to switch off predicates. ??? Which version of 7 is recommended ? Bill Bell Sent from mobile On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org wrote: *Don't* use JDK 7u40, it's been known to cause index corruption and SIGSEGV faults with Lucene: LUCENE-5212 This has not been unnoticed by Oracle. ~ David On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote: 2. Java version: There are huges performance winning between Java 5, 6 and 7; we use Oracle JDK 7u40.
Re: Cores with lot of folders with prefix index.XXXXXXX
Honestly I don't know for sure if you can delete then. Maybe make a backup then delete them and see if it still works :) Replication works differently in SolrCloud world as I currently know. I don't think there are any additional index.* folders because fallback does not work in SolrCloud (someone correct me if I am wrong!). Primož From: Yago Riveiro yago.rive...@gmail.com To: solr-user@lucene.apache.org Date: 11.10.2013 12:36 Subject:Re: Cores with lot of folders with prefix index.XXX The thread that you point is about master / slave - replication, Is this issue valid on SolrCloud context? I check the index.properties and indeed the variable index=index.X point to a folder, the others can be deleted without any scary side effect? -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, October 11, 2013 at 11:31 AM, primoz.sk...@policija.si wrote: Do you have a lot of failed replications? Maybe those folders have something to do with this (please see the last answer at http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing ). If your disk space is valuable check index.properties file under data folder and try to determine which folders can be safely deleted. Primo¾ From: Yago Riveiro yago.rive...@gmail.com ( mailto:yago.rive...@gmail.com) To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org) Date: 11.10.2013 12:13 Subject: Re: Cores with lot of folders with prefix index.XXX I have ssd's therefor my space is like gold, I can have 30% of my space waste in failed replications, or replications that are not cleaned. The question for me is if this a normal behaviour or is a bug. If is a normal behaviour I have a trouble because a ssd with more than 512G is expensive. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si ( mailto:primoz.sk...@policija.si) wrote: I think this is connected to replications being made? I also have quite some of them but currently I am not worried :)
Re: Cores with lot of folders with prefix index.XXXXXXX
Thanks, I guess I was wrong after all in my last post. Primož From: Shalin Shekhar Mangar shalinman...@gmail.com To: solr-user@lucene.apache.org Date: 11.10.2013 12:43 Subject:Re: Cores with lot of folders with prefix index.XXX There are open issues related to extra index.XXX folders lying around if replication/recovery fails. See https://issues.apache.org/jira/browse/SOLR-4506 On Fri, Oct 11, 2013 at 4:06 PM, Yago Riveiro yago.rive...@gmail.comwrote: The thread that you point is about master / slave - replication, Is this issue valid on SolrCloud context? I check the index.properties and indeed the variable index=index.X point to a folder, the others can be deleted without any scary side effect? -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, October 11, 2013 at 11:31 AM, primoz.sk...@policija.si wrote: Do you have a lot of failed replications? Maybe those folders have something to do with this (please see the last answer at http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing ). If your disk space is valuable check index.properties file under data folder and try to determine which folders can be safely deleted. Primo¾ From: Yago Riveiro yago.rive...@gmail.com (mailto: yago.rive...@gmail.com) To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org) Date: 11.10.2013 12:13 Subject: Re: Cores with lot of folders with prefix index.XXX I have ssd's therefor my space is like gold, I can have 30% of my space waste in failed replications, or replications that are not cleaned. The question for me is if this a normal behaviour or is a bug. If is a normal behaviour I have a trouble because a ssd with more than 512G is expensive. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si(mailto: primoz.sk...@policija.si) wrote: I think this is connected to replications being made? I also have quite some of them but currently I am not worried :) -- Regards, Shalin Shekhar Mangar.
Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository
So the main problem was that the libs must be copied to the WEB-INF/lib directory insteed of the jetty lib/ext directory. Is the fact that you should you use WEB-INF/lib somewhere documented? 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com Not so hard switching it to Oracle JDK 7u40. Just download it and change the JAVA_HOME path in /etc/default/jetty, so it's not nescessary to switch java version with update-java-alternatives The machine is 64bit :) 2013/10/11 Bill Bell billnb...@gmail.com Does this work ? I can suggest -XX:-UseLoopPredicate to switch off predicates. ??? Which version of 7 is recommended ? Bill Bell Sent from mobile On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org wrote: *Don't* use JDK 7u40, it's been known to cause index corruption and SIGSEGV faults with Lucene: LUCENE-5212 This has not been unnoticed by Oracle. ~ David On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote: 2. Java version: There are huges performance winning between Java 5, 6 and 7; we use Oracle JDK 7u40.
Re: Re: feedback on Solr 4.x LotsOfCores feature
bq: sharing the underlying solrconfig object the configset introduced in JIRA SOLR-4478 seems to be the solution for non-SolrCloud mode SOLR-4478 will NOT share the underlying config objects, it simply shares the underlying directory. Each core will, at least as presently envisioned, simply read the files that exist there and create their own solrconfig object. Schema objects may be shared, but not config objects. It may turn out to be relatively easy to do in the configset situation, but last time I looked at sharing the underlying config object it was too fraught with problems. bq: 15K cores is around 4 minutes I find this very odd. On my laptop, spinning disk, I think I was seeing 1k cores discovered/sec. You're seeing roughly 16x slower, so I have no idea what's going on here. If this is just reading the files, you should be seeing horrible disk contention. Are you on some kind of networked drive? bq: To do that in background and to block on that request until core discovery is complete, should not work for us (due to the worst case). What other choices are there? Either you have to do it up front or with some kind of blocking. Hmmm, I suppose you could keep some kind of custom store (DB? File? ZooKeeper?) that would keep the last known layout. You'd still have some kind of worst-case situation where the core you were trying to load wouldn't be in your persistent store and you'd _still_ have to wait for the discovery process to complete. bq: and we will use the cores Auto option to create load or only load the core on Interesting. I can see how this could all work without any core discovery but it does require a very specific setup. On Thu, Oct 10, 2013 at 11:42 AM, Soyez Olivier olivier.so...@worldline.com wrote: The corresponding patch for Solr 4.2.1 LotsOfCores can be found in SOLR-5316, including the new Cores options : - numBuckets to create a subdirectory based on a hash on the corename % numBuckets in the core Datadir - Auto with 3 differents values : 1) false : default behaviour 2) createLoad : create, if not exist, and load the core on the fly on the first incoming request (update, select) 3) onlyLoad : load the core on the fly on the first incoming request (update, select), if exist on disk Concerning : - sharing the underlying solrconfig object, the configset introduced in JIRA SOLR-4478 seems to be the solution for non-SolrCloud mode. We need to test it for our use case. If another solution exists, please tell me. We are very interested in such functionality and to contribute, if we can. - the possibility of lotsOfCores in SolrCloud, we don't know in details how SolrCloud is working. But one possible limit is the maximum number of entries that can be added to a zookeeper node. Maybe, a solution will be just a kind of hashing in the zookeeper tree. - the time to discover cores in Solr 4.4 : with spinning disk under linux, all cores with transient=true and loadOnStartup=false, the linux buffer cache empty before starting Solr : 15K cores is around 4 minutes. It's linear in the cores number, so for 50K it's more than 13 minutes. In fact, it corresponding to the time to read all core.properties files. To do that in background and to block on that request until core discovery is complete, should not work for us (due to the worst case). So, we will just disable the core Discovery, because we don't need to know all cores from the start. Start Solr without any core entries in solr.xml, and we will use the cores Auto option to create load or only load the core on the fly, based on the existence of the core on the disk (absolute path calculated from the core name). Thanks for your interest, Olivier De : Erick Erickson [erickerick...@gmail.com] Date d'envoi : lundi 7 octobre 2013 14:33 À : solr-user@lucene.apache.org Objet : Re: feedback on Solr 4.x LotsOfCores feature Thanks for the great writeup! It's always interesting to see how a feature plays out in the real world. A couple of questions though: bq: We added 2 Cores options : Do you mean you patched Solr? If so are you willing to shard the code back? If both are yes, please open a JIRA, attach the patch and assign it to me. bq: the number of file descriptors, it used a lot (need to increase global max and per process fd) Right, this makes sense since you have a bunch of cores all with their own descriptors open. I'm assuming that you hit a rather high max number and it stays pretty steady bq: the overhead to parse solrconfig.xml and load dependencies to open each core Right, I tried to look at sharing the underlying solrconfig object but it seemed pretty hairy. There are some extensive comments in the JIRA of the problems I foresaw. There may be some action on this in the future. bq: lotsOfCores doesn’t work with SolrCloud Right, we haven't concentrated on that, it's an interesting problem. In
Re: Find documents that are composed of % words
bq: but you cannot ask this to client. You _can_ ask this of a client. IMO you are obligated to. A gentle way to do that is say something like: Solr doesn't do that out-of-the-box. I estimate it will take me XXX weeks to implement that in custom code. I will be unable to make progress on features A-F during that time. We can try tweaking Solr's ranking with the standard configurations and see if that satisfies your ranking requirements in YYY days. Please prioritize this relative to the other features. I have, quite literally been in very similar situations. The client was convinced that BM25 ranking would give better results (this was before flexible scoring). They never needed the BM25 stuff. And their project was wildly successful. It's amazing how often software people don't give this feedback and then the project managers are surprised later by time/cost overruns or lack of features. We _must_ inform our clients of the costs of a feature and cheaper alternatives before they can make informed decisions. It's also amazing how often, when given realistic cost estimates, features like this get put off forever. On those occasions when it _does_ make a difference, at least the client has the information necessary to prioritize, and their expectations are set appropriately. Rant done, Erick On Thu, Oct 10, 2013 at 3:03 PM, shahzad73 shahzad...@yahoo.com wrote: Yes the correct is answer may be Why but you cannot ask this to client. He think there is something interesting with this formula and if it works we can index websites with Nutch + Solrand let users input queries that can locate documents which has % of foreign words other than list provided. i will check the answer provided Shahzad -- View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264p4094778.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository
I can report that jetty is running now with this options: JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat -XX:+UseStringCache -Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS @Guido: I reduced the min/max heap size to 256m, i will increase this on production server. 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com So the main problem was that the libs must be copied to the WEB-INF/lib directory insteed of the jetty lib/ext directory. Is the fact that you should you use WEB-INF/lib somewhere documented? 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com Not so hard switching it to Oracle JDK 7u40. Just download it and change the JAVA_HOME path in /etc/default/jetty, so it's not nescessary to switch java version with update-java-alternatives The machine is 64bit :) 2013/10/11 Bill Bell billnb...@gmail.com Does this work ? I can suggest -XX:-UseLoopPredicate to switch off predicates. ??? Which version of 7 is recommended ? Bill Bell Sent from mobile On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org wrote: *Don't* use JDK 7u40, it's been known to cause index corruption and SIGSEGV faults with Lucene: LUCENE-5212 This has not been unnoticed by Oracle. ~ David On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote: 2. Java version: There are huges performance winning between Java 5, 6 and 7; we use Oracle JDK 7u40.
Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository
I can't tell for sure if that is documented somewhere, I did that straight forward cause of years I have been developing Java webapps, a class not found usually means that some jar/class is missing somewhere, because of all the issues I have seen with parent-child class loaders, my 1st choice is usually to make the jars/classes available to the relevant webapp classloader, in this case to WEB-INF/lib Solr webapp; which if running several webapps will require more PERM GEN space, but in this case is not a problem cause there is only one webapp running which won't lead to several child class loader loading the same set of classes from a jar. I have seen too man weird things with class loaders, well, enough about class loading, don't want to hijack the subject of this thread, HTH, Guido. On 11/10/13 11:55, Peter Schmidt wrote: So the main problem was that the libs must be copied to the WEB-INF/lib directory insteed of the jetty lib/ext directory. Is the fact that you should you use WEB-INF/lib somewhere documented? 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com Not so hard switching it to Oracle JDK 7u40. Just download it and change the JAVA_HOME path in /etc/default/jetty, so it's not nescessary to switch java version with update-java-alternatives The machine is 64bit :) 2013/10/11 Bill Bell billnb...@gmail.com Does this work ? I can suggest -XX:-UseLoopPredicate to switch off predicates. ??? Which version of 7 is recommended ? Bill Bell Sent from mobile On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org wrote: *Don't* use JDK 7u40, it's been known to cause index corruption and SIGSEGV faults with Lucene: LUCENE-5212 This has not been unnoticed by Oracle. ~ David On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote: 2. Java version: There are huges performance winning between Java 5, 6 and 7; we use Oracle JDK 7u40.
Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository
Remember the -server which for Java webapps or dedicated Java services will improve things. Guido. On 11/10/13 12:26, Peter Schmidt wrote: I can report that jetty is running now with this options: JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat -XX:+UseStringCache -Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS @Guido: I reduced the min/max heap size to 256m, i will increase this on production server. 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com So the main problem was that the libs must be copied to the WEB-INF/lib directory insteed of the jetty lib/ext directory. Is the fact that you should you use WEB-INF/lib somewhere documented? 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com Not so hard switching it to Oracle JDK 7u40. Just download it and change the JAVA_HOME path in /etc/default/jetty, so it's not nescessary to switch java version with update-java-alternatives The machine is 64bit :) 2013/10/11 Bill Bell billnb...@gmail.com Does this work ? I can suggest -XX:-UseLoopPredicate to switch off predicates. ??? Which version of 7 is recommended ? Bill Bell Sent from mobile On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org wrote: *Don't* use JDK 7u40, it's been known to cause index corruption and SIGSEGV faults with Lucene: LUCENE-5212 This has not been unnoticed by Oracle. ~ David On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote: 2. Java version: There are huges performance winning between Java 5, 6 and 7; we use Oracle JDK 7u40.
Re: Find documents that are composed of % words
Eric agreed Solr + Nutch solution was proposed by myself and had never used these technologies, this is first time i handle these 2. My initial response to client's requirments were to try to work out existing industry tools and then modify it according to client requirements instead of re-inventing the wheel. I start from 0 to this point and was not even aware Sole can handle this sort of requirement . Now all infrastructure is there crawler + index and a app to make searches, its just this base requirement to fullfill. At the moment i am moving in dark to configure Solr to handle this requirements. Here is what I am thinking to do Develop a filter which is called in search time for a field that will hold all tokens for the page. it will determine how many tokens (words) match with criteria words and what are remaining tokens. get the total number of tokens for a document and produce the % of matched and unmatched ratio. Not sure above solution will work. so need suggestions -- View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264p4094953.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Find documents that are composed of % words
Aloke Ghoshal i'm trying to work out your equation. i am using standard scheme provided by nutch for solr and not aware of how to calculate myfieldwordcount in first query.no idea where this count will come from. is there any filter that will store number of tokens generated for a specific field and store it as another field. that way we can use it . not sure what norm does in second equation try to find information for this from online and did not find any yet. please explain Shahzad -- View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264p4094955.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository
@Guido: Itried it before and than i thought you marked just the server options Because the -sever causes a: sudo service jetty start * Starting Jetty servlet engine. jetty Invalid option -server Cannot parse command line arguments Or should i substitute server with ...? Options with -server: JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat -XX:+UseStringCache -Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS 2013/10/11 Guido Medina guido.med...@temetra.com Remember the -server which for Java webapps or dedicated Java services will improve things. Guido. On 11/10/13 12:26, Peter Schmidt wrote: I can report that jetty is running now with this options: JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat -XX:+UseStringCache -Dsolr.solr.home=/usr/share/**solr $JAVA_OPTIONS @Guido: I reduced the min/max heap size to 256m, i will increase this on production server. 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com So the main problem was that the libs must be copied to the WEB-INF/lib directory insteed of the jetty lib/ext directory. Is the fact that you should you use WEB-INF/lib somewhere documented? 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com Not so hard switching it to Oracle JDK 7u40. Just download it and change the JAVA_HOME path in /etc/default/jetty, so it's not nescessary to switch java version with update-java-alternatives The machine is 64bit :) 2013/10/11 Bill Bell billnb...@gmail.com Does this work ? I can suggest -XX:-UseLoopPredicate to switch off predicates. ??? Which version of 7 is recommended ? Bill Bell Sent from mobile On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org wrote: *Don't* use JDK 7u40, it's been known to cause index corruption and SIGSEGV faults with Lucene: LUCENE-5212 This has not been unnoticed by Oracle. ~ David On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote: 2. Java version: There are huges performance winning between Java 5, 6 and 7; we use Oracle JDK 7u40.
Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository
It is JVM parameter, example: JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -Xms256m -Xmx256m If you want to concatenate more JVM parameters you do it like this: JAVA_OPTIONS=-Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS Take a good look at the format, Guido. On 11/10/13 13:37, Peter Schmidt wrote: @Guido: Itried it before and than i thought you marked just the server options Because the -sever causes a: sudo service jetty start * Starting Jetty servlet engine. jetty Invalid option -server Cannot parse command line arguments Or should i substitute server with ...? Options with -server: JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat -XX:+UseStringCache -Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS 2013/10/11 Guido Medina guido.med...@temetra.com Remember the -server which for Java webapps or dedicated Java services will improve things. Guido. On 11/10/13 12:26, Peter Schmidt wrote: I can report that jetty is running now with this options: JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat -XX:+UseStringCache -Dsolr.solr.home=/usr/share/**solr $JAVA_OPTIONS @Guido: I reduced the min/max heap size to 256m, i will increase this on production server. 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com So the main problem was that the libs must be copied to the WEB-INF/lib directory insteed of the jetty lib/ext directory. Is the fact that you should you use WEB-INF/lib somewhere documented? 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com Not so hard switching it to Oracle JDK 7u40. Just download it and change the JAVA_HOME path in /etc/default/jetty, so it's not nescessary to switch java version with update-java-alternatives The machine is 64bit :) 2013/10/11 Bill Bell billnb...@gmail.com Does this work ? I can suggest -XX:-UseLoopPredicate to switch off predicates. ??? Which version of 7 is recommended ? Bill Bell Sent from mobile On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org wrote: *Don't* use JDK 7u40, it's been known to cause index corruption and SIGSEGV faults with Lucene: LUCENE-5212 This has not been unnoticed by Oracle. ~ David On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote: 2. Java version: There are huges performance winning between Java 5, 6 and 7; we use Oracle JDK 7u40.
Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository
Strange. When i add -server to the arguments, i got everytime the error on jetty startup Invalid option -server Cannot parse command line arguments 2013/10/11 Guido Medina guido.med...@temetra.com It is JVM parameter, example: JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -server -Xms256m -Xmx256m If you want to concatenate more JVM parameters you do it like this: JAVA_OPTIONS=-Dsolr.solr.**home=/usr/share/solr $JAVA_OPTIONS Take a good look at the format, Guido. On 11/10/13 13:37, Peter Schmidt wrote: @Guido: Itried it before and than i thought you marked just the server options Because the -sever causes a: sudo service jetty start * Starting Jetty servlet engine. jetty Invalid option -server Cannot parse command line arguments Or should i substitute server with ...? Options with -server: JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -server -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat -XX:+UseStringCache -Dsolr.solr.home=/usr/share/**solr $JAVA_OPTIONS 2013/10/11 Guido Medina guido.med...@temetra.com Remember the -server which for Java webapps or dedicated Java services will improve things. Guido. On 11/10/13 12:26, Peter Schmidt wrote: I can report that jetty is running now with this options: JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat -XX:+UseStringCache -Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS @Guido: I reduced the min/max heap size to 256m, i will increase this on production server. 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com So the main problem was that the libs must be copied to the WEB-INF/lib directory insteed of the jetty lib/ext directory. Is the fact that you should you use WEB-INF/lib somewhere documented? 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com Not so hard switching it to Oracle JDK 7u40. Just download it and change the JAVA_HOME path in /etc/default/jetty, so it's not nescessary to switch java version with update-java-alternatives The machine is 64bit :) 2013/10/11 Bill Bell billnb...@gmail.com Does this work ? I can suggest -XX:-UseLoopPredicate to switch off predicates. ??? Which version of 7 is recommended ? Bill Bell Sent from mobile On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org wrote: *Don't* use JDK 7u40, it's been known to cause index corruption and SIGSEGV faults with Lucene: LUCENE-5212 This has not been unnoticed by Oracle. ~ David On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote: 2. Java version: There are huges performance winning between Java 5, 6 and 7; we use Oracle JDK 7u40.
Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository
Oh, i got it http://stackoverflow.com/a/5273166/326905 at least 2 cores and at least 2 GB physical memory Until know i'm using a VM with single core and 1GB RAM. So this will be later for production :) Thank you Guido. 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com Strange. When i add -server to the arguments, i got everytime the error on jetty startup Invalid option -server Cannot parse command line arguments 2013/10/11 Guido Medina guido.med...@temetra.com It is JVM parameter, example: JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -server -Xms256m -Xmx256m If you want to concatenate more JVM parameters you do it like this: JAVA_OPTIONS=-Dsolr.solr.**home=/usr/share/solr $JAVA_OPTIONS Take a good look at the format, Guido. On 11/10/13 13:37, Peter Schmidt wrote: @Guido: Itried it before and than i thought you marked just the server options Because the -sever causes a: sudo service jetty start * Starting Jetty servlet engine. jetty Invalid option -server Cannot parse command line arguments Or should i substitute server with ...? Options with -server: JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -server -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat -XX:+UseStringCache -Dsolr.solr.home=/usr/share/**solr $JAVA_OPTIONS 2013/10/11 Guido Medina guido.med...@temetra.com Remember the -server which for Java webapps or dedicated Java services will improve things. Guido. On 11/10/13 12:26, Peter Schmidt wrote: I can report that jetty is running now with this options: JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat -XX:+UseStringCache -Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS @Guido: I reduced the min/max heap size to 256m, i will increase this on production server. 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com So the main problem was that the libs must be copied to the WEB-INF/lib directory insteed of the jetty lib/ext directory. Is the fact that you should you use WEB-INF/lib somewhere documented? 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com Not so hard switching it to Oracle JDK 7u40. Just download it and change the JAVA_HOME path in /etc/default/jetty, so it's not nescessary to switch java version with update-java-alternatives The machine is 64bit :) 2013/10/11 Bill Bell billnb...@gmail.com Does this work ? I can suggest -XX:-UseLoopPredicate to switch off predicates. ??? Which version of 7 is recommended ? Bill Bell Sent from mobile On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org wrote: *Don't* use JDK 7u40, it's been known to cause index corruption and SIGSEGV faults with Lucene: LUCENE-5212 This has not been unnoticed by Oracle. ~ David On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote: 2. Java version: There are huges performance winning between Java 5, 6 and 7; we use Oracle JDK 7u40.
SolrCloud on SSL
I have 3 SolrCloud nodes (call them idx1, idx2, idx3), and the boxes have SSL certs configured on them to protect the Solr Indexes. Right now, I can do queries on idx1 and it works fine. If I try to query on idx3, I get: org.apache.solr.common.SolrException: org.apache.sorl.client.solrj.SolrServerException:IOException occurred when talking to server at http://idx1:8443/solr/test1 (and then a long stack trace -- can't copy it, on a test network) Is there a spot in a Solr configuration that I can set this up to use HTTPS? Let me know if you need more information to determine the problem. Thanks! -- Chris
Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository
If your single core is at 32bits use Oracle JDK 7u25 or Ubuntu Open JDK 7, the JDK 7u40 for 32bits will corrupt indexes as stated on the lucene bug report. Guido. On 11/10/13 15:13, Peter Schmidt wrote: Oh, i got it http://stackoverflow.com/a/5273166/326905 at least 2 cores and at least 2 GB physical memory Until know i'm using a VM with single core and 1GB RAM. So this will be later for production :) Thank you Guido. 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com Strange. When i add -server to the arguments, i got everytime the error on jetty startup Invalid option -server Cannot parse command line arguments 2013/10/11 Guido Medina guido.med...@temetra.com It is JVM parameter, example: JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -server -Xms256m -Xmx256m If you want to concatenate more JVM parameters you do it like this: JAVA_OPTIONS=-Dsolr.solr.**home=/usr/share/solr $JAVA_OPTIONS Take a good look at the format, Guido. On 11/10/13 13:37, Peter Schmidt wrote: @Guido: Itried it before and than i thought you marked just the server options Because the -sever causes a: sudo service jetty start * Starting Jetty servlet engine. jetty Invalid option -server Cannot parse command line arguments Or should i substitute server with ...? Options with -server: JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -server -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat -XX:+UseStringCache -Dsolr.solr.home=/usr/share/**solr $JAVA_OPTIONS 2013/10/11 Guido Medina guido.med...@temetra.com Remember the -server which for Java webapps or dedicated Java services will improve things. Guido. On 11/10/13 12:26, Peter Schmidt wrote: I can report that jetty is running now with this options: JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat -XX:+UseStringCache -Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS @Guido: I reduced the min/max heap size to 256m, i will increase this on production server. 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com So the main problem was that the libs must be copied to the WEB-INF/lib directory insteed of the jetty lib/ext directory. Is the fact that you should you use WEB-INF/lib somewhere documented? 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com Not so hard switching it to Oracle JDK 7u40. Just download it and change the JAVA_HOME path in /etc/default/jetty, so it's not nescessary to switch java version with update-java-alternatives The machine is 64bit :) 2013/10/11 Bill Bell billnb...@gmail.com Does this work ? I can suggest -XX:-UseLoopPredicate to switch off predicates. ??? Which version of 7 is recommended ? Bill Bell Sent from mobile On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org wrote: *Don't* use JDK 7u40, it's been known to cause index corruption and SIGSEGV faults with Lucene: LUCENE-5212 This has not been unnoticed by Oracle. ~ David On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote: 2. Java version: There are huges performance winning between Java 5, 6 and 7; we use Oracle JDK 7u40.
Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository
no it is 64bit and just a development VM. In production the solr will use multicore, also 64bit and some gb ram. 2013/10/11 Guido Medina guido.med...@temetra.com If your single core is at 32bits use Oracle JDK 7u25 or Ubuntu Open JDK 7, the JDK 7u40 for 32bits will corrupt indexes as stated on the lucene bug report. Guido. On 11/10/13 15:13, Peter Schmidt wrote: Oh, i got it http://stackoverflow.com/a/**5273166/326905http://stackoverflow.com/a/5273166/326905 at least 2 cores and at least 2 GB physical memory Until know i'm using a VM with single core and 1GB RAM. So this will be later for production :) Thank you Guido. 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com Strange. When i add -server to the arguments, i got everytime the error on jetty startup Invalid option -server Cannot parse command line arguments 2013/10/11 Guido Medina guido.med...@temetra.com It is JVM parameter, example: JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -Xms256m -Xmx256m If you want to concatenate more JVM parameters you do it like this: JAVA_OPTIONS=-Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS Take a good look at the format, Guido. On 11/10/13 13:37, Peter Schmidt wrote: @Guido: Itried it before and than i thought you marked just the server options Because the -sever causes a: sudo service jetty start * Starting Jetty servlet engine. jetty Invalid option -server Cannot parse command line arguments Or should i substitute server with ...? Options with -server: JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat -XX:+UseStringCache -Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS 2013/10/11 Guido Medina guido.med...@temetra.com Remember the -server which for Java webapps or dedicated Java services will improve things. Guido. On 11/10/13 12:26, Peter Schmidt wrote: I can report that jetty is running now with this options: JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat -XX:+UseStringCache -Dsolr.solr.home=/usr/share/**solr $JAVA_OPTIONS @Guido: I reduced the min/max heap size to 256m, i will increase this on production server. 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com So the main problem was that the libs must be copied to the WEB-INF/lib directory insteed of the jetty lib/ext directory. Is the fact that you should you use WEB-INF/lib somewhere documented? 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com Not so hard switching it to Oracle JDK 7u40. Just download it and change the JAVA_HOME path in /etc/default/jetty, so it's not nescessary to switch java version with update-java-alternatives The machine is 64bit :) 2013/10/11 Bill Bell billnb...@gmail.com Does this work ? I can suggest -XX:-UseLoopPredicate to switch off predicates. ??? Which version of 7 is recommended ? Bill Bell Sent from mobile On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org wrote: *Don't* use JDK 7u40, it's been known to cause index corruption and SIGSEGV faults with Lucene: LUCENE-5212 This has not been unnoticed by Oracle. ~ David On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote: 2. Java version: There are huges performance winning between Java 5, 6 and 7; we use Oracle JDK 7u40.
Problems using DataImportHandler and TikaEntityProcessor
Starting Solr with the command line java -Dsolr.solr.home=example-DIH/solr -jar start.jar and then trying to import some data with java -Durl=http://localhost:8983/solr/tika/update -Dtype=application/pdf -jar post.jar *.pdf fails with error SimplePostTool: WARNING: Solr returned an error #400 Bad Request SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for URL: http://localhost:8983/solr/tika/update These are all valid PDFs that I have previously been able to import with Solr Cell. What am I doing wrong? Dr Peter J Bleackley Computational Linguistics Contractor Playful Technology Ltd
Re: Multiple schemas in the same SolrCloud ?
Upload the new configuration and the use the collection API to reload you collection https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-ReloadaCollection -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279p4094978.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Questions developing custom functionquery
Hey Mikhail, Thanks for responding. Field: resourcename Field-Type: org.apache.solr.schema.TextField All 9 boxes checked (indexed, tokenized, stored). I have various other fields (including MD5-checksums) in my Schema. When I use a md5sum field (which is a str field, but doesn't have spaces, forward slashes, etc.) The plugin I've written performs exactly as I've expected. I think the large part of my problem is that my ValueSource is being instantianted as a the class StrFieldSource. When you call getvalues on a StrFieldSource, you end up with a DocTermsIndexDocValueshttp://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-queries/4.3.0/org/apache/lucene/queries/function/docvalues/DocTermsIndexDocValues.java#DocTermsIndexDocValues. Calling getVal() on a DocTermsIndexDocValues does some really weird stuff that I really don't understand. I assumed that calling ValueSource.getValues(...).strVal(int doc) would simply return the data that my field corresponds to, but I don't think that is true. Its possible I'm going about this wrong and need to re-do my approach. I'm just currently at a loss for what that approach is. On Fri, Oct 11, 2013 at 2:48 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello JT, what's is the field and fieldType definition for resname ? can't you check how '/some example/data/here/2013/09/12/ testing.text ' is handled on analysis page in SolrAdmin? On Fri, Oct 11, 2013 at 4:53 AM, Richard Lee rockiee...@gmail.com wrote: seems what u got is the terms other than the raw data. maybe u should check the api docs for more details 2013-10-11 上午3:56于 JT handyrems...@gmail.com写道: I'm running into some issues developing a custom functionquery. My goal is to be able to implement a custom sorting technique. I have a field defined called resname, it is a single value str. Example: str name=resname/some example/data/here/2013/09/12/testing.text/str I would like to do a custom sort based on this resname field. Basically, I would like to parse out that date there (2013/09/12) and sort on that date. I've followed various tutorials - http://java.dzone.com/news/how-write-custom-solr - http://www.supermind.org/blog/756/how-to-write-a-custom-solr-functionquery Im at the point where my code compiles, runs, executes, etc. Solr is happy with my code. I have classes that inherit from ValueSorceParser and ValueSorce, etc. I've overrode parse and instantiated my class with ValueSource public ValueSource parse(FunctionQParser fqp) { return MyCustomClass(fqp.parseValueSource) } public class MyCustomClass extends ValueSource { ValueSource source; public MyCustomClass(ValueSource source) { this.source = source; } public FunctionValues getValues() { final FunctionValues sourceDV = source.getvalues(context,readerContext) return new IntValues(this) public int intVal(int doc) { //parse the value of resname here String value = sourceDV.strVal(doc); ...more stuff } } } The issue I'm running into is that my call to sourceDV.strVal(doc) only returns part of the field, not all of it. It appears to be very random. I guess my actual question is, how do I access / reference the EXACT RAW value of a field, while writing a functionquery. Do I need to change my ValueSource to a String?, then somehow lookup the field name while inside my getValues call? Is there a way to access the raw field data , when referencing it as a FunctionValues? Maybe I'm going about this totally incorrectly? -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Problems using DataImportHandler and TikaEntityProcessor
There may be a problem with you schema. Could you send your solr logs? 2013/10/11 Peter Bleackley bleackl...@zooey.co.uk Starting Solr with the command line java -Dsolr.solr.home=example-DIH/**solr -jar start.jar and then trying to import some data with java -Durl=http://localhost:8983/**solr/tika/updatehttp://localhost:8983/solr/tika/update-Dtype=application/pdf -jar post.jar *.pdf fails with error SimplePostTool: WARNING: Solr returned an error #400 Bad Request SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for URL: http://localhost:8983/solr/**tika/updatehttp://localhost:8983/solr/tika/update These are all valid PDFs that I have previously been able to import with Solr Cell. What am I doing wrong? Dr Peter J Bleackley Computational Linguistics Contractor Playful Technology Ltd
Re: SolrCloud on SSL
On 10/11/2013 8:17 AM, Christopher Gross wrote: I have 3 SolrCloud nodes (call them idx1, idx2, idx3), and the boxes have SSL certs configured on them to protect the Solr Indexes. Right now, I can do queries on idx1 and it works fine. If I try to query on idx3, I get: org.apache.solr.common.SolrException: org.apache.sorl.client.solrj.SolrServerException:IOException occurred when talking to server at http://idx1:8443/solr/test1 (and then a long stack trace -- can't copy it, on a test network) Is there a spot in a Solr configuration that I can set this up to use HTTPS? From what I can tell, not yet. https://issues.apache.org/jira/browse/SOLR-3854 https://issues.apache.org/jira/browse/SOLR-4407 https://issues.apache.org/jira/browse/SOLR-4470 I'm wondering why you want to do this, though. It adds extra CPU overhead. Perhaps not a lot, but it's not free. As for protecting Solr against eavesdropping, is it in a location where that's possible? The bottom line is this: People that you cannot trust should not have direct access to Solr. It should be firewalled so only trusted personnel and applications can talk to it. Anyone who has direct access to Solr can change your index, delete your index, and send denial of service queries. If you take steps to block access to the update handler(s) and the admin UI, denial of service queries are still possible. Blocking access to the update handlers and admin UI is not something Solr itself can do - that's a job for the servlet container. Related general issue: The /browse handler included in the example (which utilizes code written in velocity) requires that the user have direct access to Solr. This makes its very design insecure. That handler is intended as a demonstration of Solr's capabilities and how to use them, it's not for production. Thanks, Shawn
Re: Cores with lot of folders with prefix index.XXXXXXX
On 10/11/2013 4:36 AM, Yago Riveiro wrote: The thread that you point is about master / slave - replication, Is this issue valid on SolrCloud context? I check the index.properties and indeed the variable index=index.X point to a folder, the others can be deleted without any scary side effect? SolrCloud uses traditional replication behind the scenes as a last resort to recover an index when there's some kind of failure, or when it determines that things are too far out of sync after a Solr restart, or when adding replicas. During normal operation, traditional replication is *NOT* used. If you are getting a lot of index. directories, this may be an indication of an underlying issue, unless you are testing things and doing a lot of Solr restarts, in which case it may be expected. The index.properties file may be one way to go. I would want to be absolutely sure before deleting directories. You should be able to manually check which index directory Solr is using (with tools like lsof for Linux or Process Explorer for Windows) and delete the others. Thanks, Shawn
Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository
Then I think you downloaded the wrong JDK 7 (32bits JDK?), if you are running JDK 7 64bits the -server flag should be recognized. According to the stackoverflow link you mentioned before. Guido. On 11/10/13 15:48, Peter Schmidt wrote: no it is 64bit and just a development VM. In production the solr will use multicore, also 64bit and some gb ram. 2013/10/11 Guido Medina guido.med...@temetra.com If your single core is at 32bits use Oracle JDK 7u25 or Ubuntu Open JDK 7, the JDK 7u40 for 32bits will corrupt indexes as stated on the lucene bug report. Guido. On 11/10/13 15:13, Peter Schmidt wrote: Oh, i got it http://stackoverflow.com/a/**5273166/326905http://stackoverflow.com/a/5273166/326905 at least 2 cores and at least 2 GB physical memory Until know i'm using a VM with single core and 1GB RAM. So this will be later for production :) Thank you Guido. 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com Strange. When i add -server to the arguments, i got everytime the error on jetty startup Invalid option -server Cannot parse command line arguments 2013/10/11 Guido Medina guido.med...@temetra.com It is JVM parameter, example: JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -Xms256m -Xmx256m If you want to concatenate more JVM parameters you do it like this: JAVA_OPTIONS=-Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS Take a good look at the format, Guido. On 11/10/13 13:37, Peter Schmidt wrote: @Guido: Itried it before and than i thought you marked just the server options Because the -sever causes a: sudo service jetty start * Starting Jetty servlet engine. jetty Invalid option -server Cannot parse command line arguments Or should i substitute server with ...? Options with -server: JAVA_OPTIONS=-Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat -XX:+UseStringCache -Dsolr.solr.home=/usr/share/solr $JAVA_OPTIONS 2013/10/11 Guido Medina guido.med...@temetra.com Remember the -server which for Java webapps or dedicated Java services will improve things. Guido. On 11/10/13 12:26, Peter Schmidt wrote: I can report that jetty is running now with this options: JAVA_OPTIONS=-Djava.awt.**headless=true -Dfile.encoding=UTF-8 -Xms256m -Xmx256m -XX:+UseG1GC -XX:MaxGCPauseMillis=50 -XX:+OptimizeStringConcat -XX:+UseStringCache -Dsolr.solr.home=/usr/share/**solr $JAVA_OPTIONS @Guido: I reduced the min/max heap size to 256m, i will increase this on production server. 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com So the main problem was that the libs must be copied to the WEB-INF/lib directory insteed of the jetty lib/ext directory. Is the fact that you should you use WEB-INF/lib somewhere documented? 2013/10/11 Peter Schmidt peter.schmidt0...@gmail.com Not so hard switching it to Oracle JDK 7u40. Just download it and change the JAVA_HOME path in /etc/default/jetty, so it's not nescessary to switch java version with update-java-alternatives The machine is 64bit :) 2013/10/11 Bill Bell billnb...@gmail.com Does this work ? I can suggest -XX:-UseLoopPredicate to switch off predicates. ??? Which version of 7 is recommended ? Bill Bell Sent from mobile On Oct 10, 2013, at 11:29 AM, Smiley, David W. dsmi...@mitre.org wrote: *Don't* use JDK 7u40, it's been known to cause index corruption and SIGSEGV faults with Lucene: LUCENE-5212 This has not been unnoticed by Oracle. ~ David On 10/10/13 12:34 PM, Guido Medina guido.med...@temetra.com wrote: 2. Java version: There are huges performance winning between Java 5, 6 and 7; we use Oracle JDK 7u40.
Solr Slave warning: No content recieved for file
Hello. We are running a master-slave solr 3.x and we are seeing more and more of this in the slave log file: ** *Oct 10, 2013 10:17:00 PM org.apache.solr.handler.SnapPuller$FileFetcher fetchPackets* *WARNING: No content recieved for file: {name=_56l.prx, lastmodified=1381443413000, size=0}* ** Is this something we should worry about? Note that we are running some deleteDocByQuery commands on the master. Thanks. Arcadius.
SOLR Cloud on JBOSS
Hello - This wiki page is gone - https://wiki.apache.org/solr/SolrCloud%20using%20Jboss I have been able to configure an external instance of Zookeeper, and an instance of SOLR in JBOSS.. But I am unsure how to point my SOLR instance to the ZK instance and upload the configuration. All the examples I have found, show using script parameters to start SOLR rather than using a container like JBOSS. Can someone point me in the right direction? Thanks! Jeremy D. Branham Performance Technologist II Sprint University Performance Support Fort Worth, TX | Tel: **DOTNET http://JeremyBranham.Wordpress.comhttp://jeremybranham.wordpress.com/ http://www.linkedin.com/in/jeremybranham This e-mail may contain Sprint proprietary information intended for the sole use of the recipient(s). Any use by others is prohibited. If you are not the intended recipient, please contact the sender and delete all copies of the message.
Re: Problems using DataImportHandler and TikaEntityProcessor
kamaci wrote There may be a problem with you schema. Could you send your solr logs? 2013/10/11 Peter Bleackley lt; bleackleyp@.co gt; Starting Solr with the command line java -Dsolr.solr.home=example-DIH/**solr -jar start.jar and then trying to import some data with java -Durl=http://localhost:8983/**solr/tika/updatelt;http://localhost:8983/solr/tika/updategt;-Dtype=application/pdf -jar post.jar *.pdf fails with error SimplePostTool: WARNING: Solr returned an error #400 Bad Request SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for URL: http://localhost:8983/solr/**tika/updatelt;http://localhost:8983/solr/tika/updategt; These are all valid PDFs that I have previously been able to import with Solr Cell. What am I doing wrong? Dr Peter J Bleackley Computational Linguistics Contractor Playful Technology Ltd 11228 [qtp1831924725-17] INFO org.apache.solr.update.processor.LogUpdateProcessor – [tika] webapp=/solr path=/update params={} {} 0 0 11229 [qtp1831924725-17] ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: Unsupported ContentType: application/pdf Not in: [application/xml, text/csv, text/json, application/csv, application/javabin, text/xml, application/json] at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:86) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:724) I tried changing the options to -Dauto -Dfiletypes=pdf. This gave me a 404 error, apparently caused by post.jar adding /extract to the end of the URL -- View this message in context: http://lucene.472066.n3.nabble.com/Problems-using-DataImportHandler-and-TikaEntityProcessor-tp4094983p4094987.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository
On 10/11/2013 4:55 AM, Peter Schmidt wrote: So the main problem was that the libs must be copied to the WEB-INF/lib directory insteed of the jetty lib/ext directory. Is the fact that you should you use WEB-INF/lib somewhere documented? Actually, jetty's lib/ext is preferred, modifying the .war file is NOT recommended. Solr used to ship with the logging jars in the .war file, similar to the result that Guido's procedure gives you. http://wiki.apache.org/solr/SolrLogging#What_changed This was changed in version 4.3.0 because many people were having to take manual steps to change logging frameworks. There is a strong preference among people who really care about logging for using log4j or logback instead of java.util.logging. Now nobody needs to compile Solr themselves or perform surgery on the .war file when they want to change their logging, and the default produces much better results. Thanks, Shawn
Re: SolrCloud on SSL
On Fri, Oct 11, 2013 at 11:08 AM, Shawn Heisey s...@elyograg.org wrote: On 10/11/2013 8:17 AM, Christopher Gross wrote: Is there a spot in a Solr configuration that I can set this up to use HTTPS? From what I can tell, not yet. https://issues.apache.org/jira/browse/SOLR-3854 https://issues.apache.org/jira/browse/SOLR-4407 https://issues.apache.org/jira/browse/SOLR-4470 Dang. I'm wondering why you want to do this, though. It adds extra CPU overhead. Perhaps not a lot, but it's not free. As for protecting Solr against eavesdropping, is it in a location where that's possible? The bottom line is this: People that you cannot trust should not have direct access to Solr. It should be firewalled so only trusted personnel and applications can talk to it. Oh, they should be firewalled, but I can't (yet) with the existing network architecture. It's out of my direct control -- I'm just trying to stay one step ahead of the game. Anyone who has direct access to Solr can change your index, delete your index, and send denial of service queries. If you take steps to block access to the update handler(s) and the admin UI, denial of service queries are still possible. Blocking access to the update handlers and admin UI is not something Solr itself can do - that's a job for the servlet container. Related general issue: The /browse handler included in the example (which utilizes code written in velocity) requires that the user have direct access to Solr. This makes its very design insecure. That handler is intended as a demonstration of Solr's capabilities and how to use them, it's not for production. Good to know, I'll make sure that I've bumped this in my configs. Thanks!
Re: SOLR Cloud on JBOSS
On 10/11/2013 9:24 AM, Branham, Jeremy [HR] wrote: This wiki page is gone - https://wiki.apache.org/solr/SolrCloud%20using%20Jboss I have been able to configure an external instance of Zookeeper, and an instance of SOLR in JBOSS.. But I am unsure how to point my SOLR instance to the ZK instance and upload the configuration. All the examples I have found, show using script parameters to start SOLR rather than using a container like JBOSS. With version 4.4.0, you can put the zkHost parameter required to turn SolrCloud mode on in your solr.xml file. This is the case whether you use the new solr.xml format or the old solr.xml format. With versions 4.3.0 and older (which can only use the old solr.xml format), there was a bug that prevented this parameter from working correctly in solr.xml. Alternatively, you can use whatever mechanism JBoss provides for setting java system properties to set the zkHost parameter. As for uploading configurations, I strongly recommend that you do not do this with startup parameters, but rather do it with the command-line zookeeper utility. The example includes scripts for using this utility, but those scripts rely pretty heavily on the example jetty. Here's a reference that shows how to use it directly, but you must know where JBoss extracted the war file so you can use the correct classpath argument: https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities Thanks, Shawn
Re: SolrCloud on SSL
You could resolve that with SSH tunnels. Autossh with the right parameters works like a charm. HTH, Guido. On 11/10/13 16:08, Shawn Heisey wrote: On 10/11/2013 8:17 AM, Christopher Gross wrote: I have 3 SolrCloud nodes (call them idx1, idx2, idx3), and the boxes have SSL certs configured on them to protect the Solr Indexes. Right now, I can do queries on idx1 and it works fine. If I try to query on idx3, I get: org.apache.solr.common.SolrException: org.apache.sorl.client.solrj.SolrServerException:IOException occurred when talking to server at http://idx1:8443/solr/test1 (and then a long stack trace -- can't copy it, on a test network) Is there a spot in a Solr configuration that I can set this up to use HTTPS? From what I can tell, not yet. https://issues.apache.org/jira/browse/SOLR-3854 https://issues.apache.org/jira/browse/SOLR-4407 https://issues.apache.org/jira/browse/SOLR-4470 I'm wondering why you want to do this, though. It adds extra CPU overhead. Perhaps not a lot, but it's not free. As for protecting Solr against eavesdropping, is it in a location where that's possible? The bottom line is this: People that you cannot trust should not have direct access to Solr. It should be firewalled so only trusted personnel and applications can talk to it. Anyone who has direct access to Solr can change your index, delete your index, and send denial of service queries. If you take steps to block access to the update handler(s) and the admin UI, denial of service queries are still possible. Blocking access to the update handlers and admin UI is not something Solr itself can do - that's a job for the servlet container. Related general issue: The /browse handler included in the example (which utilizes code written in velocity) requires that the user have direct access to Solr. This makes its very design insecure. That handler is intended as a demonstration of Solr's capabilities and how to use them, it's not for production. Thanks, Shawn
Re: What's the purpose of the bits option in compositeId (Solr 4.5)?
Thanks folks, As an update for future readers --- the problem was on my side (my logic in picking the _route_ was flawed) as expected. :) On Tue, Oct 8, 2013 at 7:35 PM, Yonik Seeley ysee...@gmail.com wrote: On Tue, Oct 8, 2013 at 8:27 PM, Shawn Heisey s...@elyograg.org wrote: There is also the distrib=false parameter that will cause the request to be handled directly by the core it is sent to rather than being distributed/balanced by SolrCloud. Right - this is probably the best option for diagnosing what is in what index. -Yonik
Re: Problems using DataImportHandler and TikaEntityProcessor
Here is a similar conversation: http://search-lucene.com/m/GeXcg1YfgQ32/Re%253A+Solr+4.0+error+message%253A+%2522Unsupported+ContentType%253A+Content-type%253Atext%252Fxml%2522subj=Re+Solr+4+0+error+message+Unsupported+ContentType+Content+type+text+xml+ Could you change -Dauto into -Dtype=application/pdf and try it again? 2013/10/11 PeteBleackley bleackl...@zooey.co.uk kamaci wrote There may be a problem with you schema. Could you send your solr logs? 2013/10/11 Peter Bleackley lt; bleackleyp@.co gt; Starting Solr with the command line java -Dsolr.solr.home=example-DIH/**solr -jar start.jar and then trying to import some data with java -Durl= http://localhost:8983/**solr/tika/updatelt;http://localhost:8983/solr/tika/updategt;-Dtype=application/pdf -jar post.jar *.pdf fails with error SimplePostTool: WARNING: Solr returned an error #400 Bad Request SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for URL: http://localhost:8983/solr/**tika/updatelt;http://localhost:8983/solr/tika/updategt ; These are all valid PDFs that I have previously been able to import with Solr Cell. What am I doing wrong? Dr Peter J Bleackley Computational Linguistics Contractor Playful Technology Ltd 11228 [qtp1831924725-17] INFO org.apache.solr.update.processor.LogUpdateProcessor – [tika] webapp=/solr path=/update params={} {} 0 0 11229 [qtp1831924725-17] ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: Unsupported ContentType: application/pdf Not in: [application/xml, text/csv, text/json, application/csv, application/javabin, text/xml, application/json] at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:86) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:724) I tried changing the options to -Dauto -Dfiletypes=pdf. This gave me a 404 error, apparently caused by post.jar adding /extract to the end of the URL -- View this message in context:
Re: Problems using DataImportHandler and TikaEntityProcessor
On 10/11/2013 9:32 AM, PeteBleackley wrote: I tried changing the options to -Dauto -Dfiletypes=pdf. This gave me a 404 error, apparently caused by post.jar adding /extract to the end of the URL In order to use post.jar, you would need the /update/extract handler, which is not defined in the tika core under example-DIH. The example-DIH configurations are intended to use and illustrate the dataimport handler - documents are imported using the /dataimport handler and its config file, not sent directly with post.jar. Here's a page covering what you would need in order to send PDFs directly rather than import them using DIH: http://wiki.apache.org/solr/ExtractingRequestHandler Thanks, Shawn
Re: Question about plug-in update handler failure
Issue resolved. Not a Solr issue; a really hard to discover missing library in my installation. On Thu, Oct 10, 2013 at 7:10 PM, Jack Park jackp...@topicquests.org wrote: I have an interceptor which grabs SolrDocument instances in the update handler chain. It feeds those documents as a JSON string out to an agent system. That system has been running fine all the way up to Solr 4.3.1 I have discovered that, as of 4.4 and now 4.5, the very same config files, agent jar, and test harness shows that no documents are intercepted, even though the index is built. I am wondering if I missed something in changes to Solr beyond 4.3.1 which would invalidate my setup. For the record, earlier trials opened the war and dropped my agent jar into WEB-INF/lib; most recent trials on all systems leaves the war intact and drops the agent jar into collection1/lib -- it still works on 4.3.1, but nothing beyond that. Many thanks in advance for any thoughts. Jack
RE: Solr 4.4 - Master/Slave configuration - Replication Issue with Commits after deleting documents using Delete by ID
Hi Otis, Thanks for the response. The log files can be found here. MasterLog : http://pastebin.com/DPLKMPcF Slave Log: http://pastebin.com/DX9sV6Jx One more point worth mentioning here is that when we issue the commit with expungeDeletes=true, then the delete by id replication is successful. i.e. http://localhost:8983/solr/annotation/update?commit=trueexpungeDeletes=true Regards, Bharat Akkinepalli -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: Wednesday, October 09, 2013 6:35 PM To: solr-user@lucene.apache.org Subject: Re: Solr 4.4 - Master/Slave configuration - Replication Issue with Commits after deleting documents using Delete by ID Bharat, Can you look at the logs on the Master when you issue the delete and the subsequent commits and share that? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Oct 8, 2013 at 3:57 PM, Akkinepalli, Bharat (ELS-CON) b.akkinepa...@elsevier.com wrote: Hi, We have recently migrated from Solr 3.6 to Solr 4.4. We are using the Master/Slave configuration in Solr 4.4 (not Solr Cloud). We have noticed the following behavior/defect. Configuration: === 1. The Hard Commit and Soft Commit are disabled in the configuration (we control the commits from the application) 2. We have 1 Master and 2 Slaves configured and the pollInterval is configured to 10 Minutes. 3. The Master is configured to have the replicateAfter as commit startup Steps to reproduce the problem: == 1. Delete a document in Solr (using delete by id). URL - http://localhost:8983/solr/annotation/update with body as deleteidchange.me/id/delete 2. Issue a commit in Master (http://localhost:8983/solr/annotation/update?commit=true). 3. The replication of the DELETE WILL NOT happen. The master and slave has the same Index version. 4. If we try to issue another commit in Master, we see that it replicates fine. Request you to please confirm if this is a known issue. Thank you. Regards, Bharat Akkinepalli
Re: Using split in updateCSV for SolrCloud 4.4
Interestingly this URL by Jack works: 1. curl ' http://localhost/solr/prodinfo/update/csv?commit=truef.merchantList.split=truef.merchantList.separator=%3Af.merchantList.encapsulator=%22stream.contentType=text/csvstream.file=/tmp/test.csv ' But this doesn't (i.e. it doesn't split the column): 2. curl ' http://localhost/solr/prodinfo/update/csv?commit=truef.merchantList.split=truef.merchantList.separator=%3Af.merchantList.encapsulator=%22escape=\stream.contentType=text/csvstream.file=/data/dump/catalog.txt ' The only difference was escape=\, I added that in Jack's example and it didn't work either. So the culprit was escape=\, not sure why. Thanks, -Utkarsh On Thu, Oct 10, 2013 at 6:11 PM, Yonik Seeley ysee...@gmail.com wrote: Perhaps try adding echoParams=all to check that all of the input params are being parsed as expected. -Yonik On Thu, Oct 10, 2013 at 8:10 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote: Didn't help. This is the complete data: https://gist.github.com/utkarsh2012/6927649(see merchantList column). I tried this URL: curl ' http://localhost/solr/coll1/update/csv?commit=truef.merchantList.split=truef.merchantList.separator=%3Af.merchantList.encapsulator=%22escape=\stream.contentType=text/csvstream.file=/data/dump/log_20130101 ' Can this be a bug in the UpdateCSV split function? Thanks, -Utkarsh On Thu, Oct 10, 2013 at 3:11 PM, Jack Krupansky j...@basetechnology.com wrote: Using the standard Solr example for Solr 4.5, the following works, splitting the features CSV field into multiple values: curl http://localhost:8983/solr/**update/csv?commit=truef.** features.split=truef.**features.separator=%3Af.** features.encapsulator=%22 http://localhost:8983/solr/update/csv?commit=truef.features.split=truef.features.separator=%3Af.features.encapsulator=%22 -H Content-Type: text/csv -d ' id,name,features doc-1,doc1,feat1:feat2' You may need to add stream.contentType=text/csv to you command. -- Jack Krupansky -Original Message- From: Utkarsh Sengar Sent: Thursday, October 10, 2013 4:51 PM To: solr-user@lucene.apache.org Subject: Using split in updateCSV for SolrCloud 4.4 Hello, I am trying to use split: http://wiki.apache.org/solr/**UpdateCSV#split http://wiki.apache.org/solr/UpdateCSV#splitwhile loading some csv data via updateCSV. This is the field: field name=merchantList type=string indexed=true stored=true multiValued=true omitNorms=true termVectors=false termPositions=false termOffsets=false/ This is the column in CSV (merchantList): values,16179:10950,.**values.. This is the URL I call: http://localhost/solr/coll1/**update/csv?commit=truef.** merchantList.split=truef.**merchantList.separator=%3Af.** merchantList.encapsulator= http://localhost/solr/coll1/update/csv?commit=truef.merchantList.split=truef.merchantList.separator=%3Af.merchantList.encapsulator= escape=\stream.file=/data/**dump/log_20130101' Currently when I load the data, I see this: merchantList: [16179:10950], But I want this: merchantList: [16179,10950], This example is int but I have intentionally kept it as a string since some values can also be a string. Any suggestions where I am going wrong? -- Thanks, -Utkarsh -- Thanks, -Utkarsh -- Thanks, -Utkarsh
Re: Using split in updateCSV for SolrCloud 4.4
There is this note for escape: If an escape is specified, the encapsulator is not used unless also explicitly specified since most formats use either encapsulation or escaping, not both. -- Jack Krupansky -Original Message- From: Utkarsh Sengar Sent: Friday, October 11, 2013 4:35 PM To: solr-user@lucene.apache.org Subject: Re: Using split in updateCSV for SolrCloud 4.4 Interestingly this URL by Jack works: 1. curl ' http://localhost/solr/prodinfo/update/csv?commit=truef.merchantList.split=truef.merchantList.separator=%3Af.merchantList.encapsulator=%22stream.contentType=text/csvstream.file=/tmp/test.csv ' But this doesn't (i.e. it doesn't split the column): 2. curl ' http://localhost/solr/prodinfo/update/csv?commit=truef.merchantList.split=truef.merchantList.separator=%3Af.merchantList.encapsulator=%22escape=\stream.contentType=text/csvstream.file=/data/dump/catalog.txt ' The only difference was escape=\, I added that in Jack's example and it didn't work either. So the culprit was escape=\, not sure why. Thanks, -Utkarsh On Thu, Oct 10, 2013 at 6:11 PM, Yonik Seeley ysee...@gmail.com wrote: Perhaps try adding echoParams=all to check that all of the input params are being parsed as expected. -Yonik On Thu, Oct 10, 2013 at 8:10 PM, Utkarsh Sengar utkarsh2...@gmail.com wrote: Didn't help. This is the complete data: https://gist.github.com/utkarsh2012/6927649(see merchantList column). I tried this URL: curl ' http://localhost/solr/coll1/update/csv?commit=truef.merchantList.split=truef.merchantList.separator=%3Af.merchantList.encapsulator=%22escape=\stream.contentType=text/csvstream.file=/data/dump/log_20130101 ' Can this be a bug in the UpdateCSV split function? Thanks, -Utkarsh On Thu, Oct 10, 2013 at 3:11 PM, Jack Krupansky j...@basetechnology.com wrote: Using the standard Solr example for Solr 4.5, the following works, splitting the features CSV field into multiple values: curl http://localhost:8983/solr/**update/csv?commit=truef.** features.split=truef.**features.separator=%3Af.** features.encapsulator=%22 http://localhost:8983/solr/update/csv?commit=truef.features.split=truef.features.separator=%3Af.features.encapsulator=%22 -H Content-Type: text/csv -d ' id,name,features doc-1,doc1,feat1:feat2' You may need to add stream.contentType=text/csv to you command. -- Jack Krupansky -Original Message- From: Utkarsh Sengar Sent: Thursday, October 10, 2013 4:51 PM To: solr-user@lucene.apache.org Subject: Using split in updateCSV for SolrCloud 4.4 Hello, I am trying to use split: http://wiki.apache.org/solr/**UpdateCSV#split http://wiki.apache.org/solr/UpdateCSV#splitwhile loading some csv data via updateCSV. This is the field: field name=merchantList type=string indexed=true stored=true multiValued=true omitNorms=true termVectors=false termPositions=false termOffsets=false/ This is the column in CSV (merchantList): values,16179:10950,.**values.. This is the URL I call: http://localhost/solr/coll1/**update/csv?commit=truef.** merchantList.split=truef.**merchantList.separator=%3Af.** merchantList.encapsulator= http://localhost/solr/coll1/update/csv?commit=truef.merchantList.split=truef.merchantList.separator=%3Af.merchantList.encapsulator= escape=\stream.file=/data/**dump/log_20130101' Currently when I load the data, I see this: merchantList: [16179:10950], But I want this: merchantList: [16179,10950], This example is int but I have intentionally kept it as a string since some values can also be a string. Any suggestions where I am going wrong? -- Thanks, -Utkarsh -- Thanks, -Utkarsh -- Thanks, -Utkarsh
Setting SolrCloudServer collection
If using one static SolrCloudServer how can I add a bean to a certain collection. Do I need to update setDefaultCollection() each time? I doubt that thread safe? Thanks
Re: Solr Cloud hangs when replicating updates
Hey guys, We just hit a deadlock similar to this one on 4.5, and it seems to be related to leaked connections probably due to https://issues.apache.org/jira/browse/SOLR-4327. We're going to apply the suggested change to add method.abort() in the finally block and see if it fixes things. Jessica -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-hangs-when-replicating-updates-tp4088083p4095061.html Sent from the Solr - User mailing list archive at Nabble.com.
Replace NULL with 0 while Indexing
Hello, One of my indexing field have NULL values and i want it to be replaces with 0 while indexing itself. So that when i search after indexing it gives me 0 instead of NULL. This is my data-config.xml and duration is the field which has null values. dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://trdbadhoc/test_results responseBuffering=adaptive batchSize=-1 user=results password=resultsloader/ document entity name=Test_Syndrome pk=id query=SELECT TS.id AS id, TET.type AS error_type, TS.syndrome AS syndrome, S.start_date, SE.session_id AS sessionid, S.duration, TL.logfile, J.job_number AS job, cluster, S.hostname, platform FROM Test_Syndrome AS TS STRAIGHT_JOIN Session_Errors AS SE ON (SE.test_syndrome_id = TS.id) STRAIGHT_JOIN Session AS S ON (S.id = SE.session_id) STRAIGHT_JOIN Test_Run AS TR ON (TR.session_id = SE.session_id) STRAIGHT_JOIN Test_Log AS TL ON (TL.id = TR.test_log_id) STRAIGHT_JOIN Job AS J ON (J.id = TL.job_id) STRAIGHT_JOIN Cluster AS C ON (C.id = J.cluster_id) STRAIGHT_JOIN Platform ON (TR.platform_id = Platform.id) STRAIGHT_JOIN Test_Error_Type TET ON (SE.test_error_type_id = TET.id) Field column=id name=id/ Field column=error_type name=error_type/ Field column=syndrome name=syndrome/ Field column=sessionid name=sessionid/ Field column=duration name=duration/ Field column=logfile name=logfile/ Field column=job name=job/ Field column=cluster name=cluster/ Field column=hostname name=hostname/ Field column=platform name=platform/ /entity /document /dataConfig Please help. Thanks Regards, Prerna -- View this message in context: http://lucene.472066.n3.nabble.com/Replace-NULL-with-0-while-Indexing-tp4095059.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Setting SolrCloudServer collection
Set the collection param per request. It only uses the default if you don't set it. - Mark On Oct 11, 2013, at 5:26 PM, Mark static.void@gmail.com wrote: If using one static SolrCloudServer how can I add a bean to a certain collection. Do I need to update setDefaultCollection() each time? I doubt that thread safe? Thanks
Re: Solr's Filtering approaches
Groups are pharmaceutical research expts.. User is presented with graph view, he can select some region and all the groups in that region gets included..user can modify the groups also here.. so we didn't maintain group information in same solr index but we have externalized. I looked at post filter article. So my understanding is that, I simply have to extended as you did and should include implementaton for isAllowed(acls[doc], groups) .This will filter the documents in the collector and finally this collector will be returned. am I right? @Override public void collect(int doc) throws IOException { if (isAllowed(acls[doc], user, groups)) super.collect(doc); } Erick, I am interested to know whether I can extend any class that can return me only the bitset of the documents that match the search query. I can then do bitset1.andbitset2OfGroups - finally, collect only those documents to return to user. How do I try this approach? Any pointers for bit set? Thanks - David On Thu, Oct 10, 2013 at 5:25 PM, Erick Erickson erickerick...@gmail.comwrote: Well, my first question is why 50K groups is necessary, and whether you can simplify that. How a user can manually choose from among that many groups is interesting. But assuming they're all necessary, I can think of two things. If the user can only select ranges, just put in filter queries using ranges. Or possibly both ranges and individual entries, as fq=group:[1A TO 1A] OR group:(2B 45C 98Z) etc. You need to be a little careful how you put index these so range queries work properly, in the above you'd miss 2A because it's sorting lexicographically, you'd need to store in some form that sorts like 001A 01A and so on. You wouldn't need to show that form to the user, just form your fq's in the app to work with that form. If that won't work (you wouldn't want this to get huge), think about a post filter that would only operate on documents that had made it through the select, although how to convey which groups the user selected to the post filter is an open question. Best, Erick On Wed, Oct 9, 2013 at 12:23 PM, David Philip davidphilipshe...@gmail.com wrote: Hi All, I have an issue in handling filters for one of our requirements and liked to get suggestion for the best approaches. *Use Case:* 1. We have List of groups and the number of groups can increase upto 1 million. Currently we have almost 90 thousand groups in the solr search system. 2. Just before the user hits a search, He has options to select the no. of groups he want to retrieve. [the distinct list of these group Names for display are retrieved from other solr index that has more information about groups] *3.User Operation:** * Say if user selected group 1A - group 1A. and searches for key:cancer. The current approach I was thinking is : get search results and filter query by groupids' list selected by user. But my concern is When these groups list is increasing to 50k unique Ids, This can cause lot of delay in getting search results. So wanted to know whether there are different filtering ways that I can try for? I was thinking of one more approach as suggested by my colleague to do - intersection. - Get the groupIds' selected by user. Get the list of groupId's from search results, Perform intersection of both and then get the entire result set of only those groupid that intersected. Is this better way? Can I use any cache technique in this case? - David.