Re: Are we still using jetty with solr 5?
Thanks Chris and Alex for clarification :) With Regards Aman Tandon On Fri, Feb 27, 2015 at 10:27 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : So are we still using the jetty? Are we still dependent on war file? As explained in the ref guide... https://cwiki.apache.org/confluence/display/solr/Major+Changes+from+Solr+4+to+Solr+5#MajorChangesfromSolr4toSolr5-SolrisNowaStandaloneServer Internally, Solr is still implemented via Servlet APIs and is powered by Jetty -- but this is simply an implementation detail. Deployment as a webapp to other Servlet Containers (or other instances of Jetty) is not supported, and may not work in future 5.x versions of Solr when additional changes are likely to be made to Solr internally to leverage custom networking stack features. -Hoss http://www.lucidworks.com/
About solr recovery
HI, Our production solr’s replication was offline in some time but both zookeeper and network is ok, and Solr jvm is normal. my question are there any other reason will let solr’s replication into recovering state?
Re: qt.shards in solrconfig.xml
Hi Benson, Shalin, One more thing that I noticed in your configuration is incorrect definition of default Solr parameters. You should use lst tag, not list Oleg 2015-02-27 6:23 GMT+03:00 Shalin Shekhar Mangar shalinman...@gmail.com: Hi Benson, Do not use shards.qt with a leading '/'. See https://issues.apache.org/jira/browse/SOLR-3161 for details. Also note that shards.qt will not be necessary with 5.1 and beyond because of SOLR-6311 On Fri, Feb 27, 2015 at 8:16 AM, Benson Margulies bimargul...@gmail.com wrote: I apparently am feeling dense; the following does not worl. requestHandler name=/RNI class=solr.SearchHandler default=false list name=defaults str name=shards.qt/RNI/str /list arr name=components strname-indexing-query/str strname-indexing-rescore/str strfacet/str strmlt/str strhighlight/str strstats/str strdebug/str /arr /requestHandler On Thu, Feb 26, 2015 at 11:33 AM, Jack Krupansky jack.krupan...@gmail.com wrote: I was hoping that Benson was hinting at adding a qt.shards.auto=true parameter to so that would magically use on the path from the incoming request - and that this would be the default, since that's what most people would expect. Or, maybe just add a commented-out custom handler that has the qt.shards parameter as suggested, to re-emphasize to people that if they want to use a custom handler in distributed mode, then they will most likely need this parameter. -- Jack Krupansky On Thu, Feb 26, 2015 at 11:28 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello, Giving http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/%3c711daae5-c366-4349-b644-8e29e80e2...@gmail.com%3E you can add qt.shards into handler defaults/invariants. On Thu, Feb 26, 2015 at 5:40 PM, Benson Margulies bimargul...@gmail.com wrote: A query I posted yesterday amounted to me forgetting that I have to set qt.shards when I use a URL other than plain old '/select' with SolrCloud. Is there any way to configure a query handler to automate this, so that all queries addressed to '/RNI' get that added in? -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Regards, Shalin Shekhar Mangar.
Re: Solr Document expiration with TTL
Hi Thanks for the reply. I am just beginning with the solr, so not much familiar with the settings of the solr. I have created solr collection1 core with the following command. bin/solr create -c collection1 Then modified the managed-schema file to add required field definitions There were no changes made in the solrconfig.xml file except added that updateRequestProcessorChain default=true/updateRequestProcessorChain block. I can see below code defined in my solrconfig.xml file by default. initParams path=/update/**,/query,/select,/tvrh,/elevate,/spell,/browse lst name=defaults str name=df_text/str /lst /initParams initParams path=/update/** lst name=defaults str name=update.chainadd-unknown-fields-to-the-schema/str /lst /initParams *While for requestHandler/ I think its below one?* requestHandler name=/update/extract startup=lazy class=solr.extraction.ExtractingRequestHandler lst name=defaults str name=lowernamestrue/str str name=fmap.metaignored_/str str name=fmap.content_text/str /lst /requestHandler Thanks, Makailol On Thu, Feb 26, 2015 at 10:39 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : If your expire_at_dt field is not populated automatically, let's step : back and recheck a sanity setting. You said it is a managed schema? Is : it a schemaless as well? With an explicit processor chain? If that's : the case, your default chain may not be running AT ALL. yeah ... my only guess here is that even though you posted before that you had this configured in your defaut chain... processor class=solr.processor.DocExpirationUpdateProcessorFactory int name=autoDeletePeriodSeconds30/int str name=ttlFieldNametime_to_live_s/str str name=expirationFieldNameexpire_at_dt/str /processor ...perhaps you have an update.chain=foo type default param configured for your /update handler? * what does your /update requestHandler/ config look like? * are you using the new initParams/ feature of solr? what does it's config look like? : So, recheck your solrconfig.xml. Or add another explicit field : population inside the chain, just like the example did with : TimestampUpdateProcessorFactory : : https://lucidworks.com/blog/document-expiration/ yeah ... that would help as a sanity check as well ... point is: we need to verify which chain you are using when adding the doc. : : : : : Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: : http://www.solr-start.com/ : -Hoss http://www.lucidworks.com/
Re: Solr Document expiration with TTL
Yep, Your default URP chain is probably not being triggered due to the initParams. initParams are new in Solr 5, so this is still rough around the edges advice. But try giving your chain a name and adding explicit update.chain value to the requestHandler section (not initParams) section. Alternatively, since add-unknown-fields is already used, you could move your extra URPs to the start of that instead. In fact, if you are doing both timestamps and dynamically adding fields to the schema, you will need to do that anyway. Regards, Alex. On 27 February 2015 at 08:53, Makailol Charls 4extrama...@gmail.com wrote: initParams path=/update/** lst name=defaults str name=update.chainadd-unknown-fields-to-the-schema/str /lst /initParams Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/
RE: Does shard splitting double host count
Well, if you're going to reindex on a newer version, just start out with the number of shards you feel is appropriate, and reindex. But yes, if you had 3 shards, wanted to split some of them, you'd really have to split all of them (making 6), if you wanted the shards to be about the same size. As to hosts needed, if large enough, you could run 6 shards with 2 replicas (12 cores total) on just 2 hosts. Or up to 12 hosts. Or something in between. Just depends on how many cores you can fit on a host. -Original Message- From: tuxedomoon [mailto:dancolem...@yahoo.com] Sent: Friday, February 27, 2015 8:16 AM To: solr-user@lucene.apache.org Subject: Does shard splitting double host count I currently have a SolrCloud with 3 shards + replicas, it is holding 130M documents and the r3.large hosts are running out of memory. As it's on 4.2 there is no shard splitting, I will have to reindex to a 4.3+ version. If I had that feature would I need to split each shard into 2 subshards resulting in a total of 6 subshards, in order to keep all shards relatively equal? And since host memory is the problem I'd be migrating subshards to new hosts. So it seems I'd be going from 6 hosts to 12. Are these assumptions correct or is there a way to avoid doubling my host count? -- View this message in context: http://lucene.472066.n3.nabble.com/Does-shard-splitting-double-host-count-tp 4189595.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dependency Need to include for embedded solr.
On 2/27/2015 12:51 AM, Danesh Kuruppu wrote: I am doing some feasibility studies for moving directly to solr 5.0.0. One more thing, It is related to standalone server. How security handle in solr standalone server. lets say, I configured my application to use remote solr standalone server. 1. How I would enable secure communication between my application and solr server. 2. How solr server authenticate user. Solr itself does not contain any security mechanisms, because Solr does not own *any* of the network communication layers where such security must be implemented. It is not currently possible for Solr to implement any reasonable security mechanisms. Eventually (hopefully in the near future), Solr will be a completely standalone application that does not rely on a servlet container, and when that happens, it will be possible to implement security within Solr itself. Right now, configuring SSL is not very hard, and you can also enable authentication in the servlet container. It's my understanding that using certificate-based authentication works already, but if you configure basic authentication (username/password), you will find any kind of distributed searching (including SolrCloud) will not function correctly. This is because Solr does not currently have any mechanism to provide the username/password when communicating with another instance. Thanks, Shawn
Re: Are we still using jetty with solr 5?
: So are we still using the jetty? Are we still dependent on war file? As explained in the ref guide... https://cwiki.apache.org/confluence/display/solr/Major+Changes+from+Solr+4+to+Solr+5#MajorChangesfromSolr4toSolr5-SolrisNowaStandaloneServer Internally, Solr is still implemented via Servlet APIs and is powered by Jetty -- but this is simply an implementation detail. Deployment as a webapp to other Servlet Containers (or other instances of Jetty) is not supported, and may not work in future 5.x versions of Solr when additional changes are likely to be made to Solr internally to leverage custom networking stack features. -Hoss http://www.lucidworks.com/
Delimited payloads input issue
Hi - we attempt to use payloads to identify different parts of extracted HTML pages and use the DelimitedPayloadTokenFilter to assign the correct payload to the tokens. However, we are having issues for some language analyzers and issues with some types of content for most regular analyzers. If we, for example, want to assign payloads to the text within an H1 field that contains non-alphanumerics such as `Hello, i am a heading!`, and use |5 as delimiter and payload, we send the following to Solr, `Hello,|5 i|5 am|5 a|5 heading!|5`. This is not going to work because due to a WordDelimiterFilter, the tokens Hello and heading obviously loose their payload. We also cannot put the payload between the last alphanumeric and the following comma or exlamation mark because then those characters would become part of the payload if we use identity encoder, or it should fail if we use another encoder. We could solve this using a custom encoder that only takes the first character and ignores the rest, but this seems rather ugly. On the other hand, we have issues using language specific tokenizers such as Kuromoji, i will immediately dump the delimited payload so it never reaches the DelimitedPayloadTokenFilter. And if we try chinese and have the StandardTokenizer enabled, we also loose the delimited payload. Any of you have dealt with this before? Hints to share? Many thanks, Markus
Re: Solr Document expiration with TTL
: There were no changes made in the solrconfig.xml file except added that : updateRequestProcessorChain default=true/updateRequestProcessorChain : block. ok, first off: if you already *had* another updateRequestProcessorChain that said 'default=true' just adding a new one would be weird and would likely give you errors. you have t oconsider the whole context of the config and the other updateRequestProcessorChains when you make edits like that. : initParams path=/update/** : lst name=defaults : str name=update.chainadd-unknown-fields-to-the-schema/str : /lst : /initParams so that says whe nyou make any requests to a /update handler, it's going to use a default request param of update.chain=add-unknown-fields-to-the-schema. so your updates are not going to the default hanler (which you didn't give a name) they are going though the updateRequestProcessorChain/ with the name=add-unknown-fields-to-the-schema you should probably remove the chain you added, and instead put the new processors you want in the add-unknown-fields-to-the-schema chain. that's the simplest way to get what you want in place. -Hoss http://www.lucidworks.com/
Re: Solr Document expiration with TTL
Hi, Thanks for the reply. I tried adding following code block in updateRequestProcessorChain default=true in solrconfig.xml. processor class=solr.TimestampUpdateProcessorFactory str name=fieldNametimestamp_dt/str /processor and added field definition in managed-schema field name=timestamp_dt type=date stored=true multiValued=false / But then dont see this field is getting populated in document. Thanks, Makailol On Thu, Feb 26, 2015 at 8:08 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: If your expire_at_dt field is not populated automatically, let's step back and recheck a sanity setting. You said it is a managed schema? Is it a schemaless as well? With an explicit processor chain? If that's the case, your default chain may not be running AT ALL. So, recheck your solrconfig.xml. Or add another explicit field population inside the chain, just like the example did with TimestampUpdateProcessorFactory : https://lucidworks.com/blog/document-expiration/ Regards, Alex. On 26 February 2015 at 07:52, Makailol Charls 4extrama...@gmail.com wrote: since your time_to_live_s and expire_at_dt fields are both stored, can you confirm that a expire_at_dt field is getting popularted by the update processor by doing as simple query for your doc (ie q=id:10seconds) No, expire_at_dt field does not get populated when we have added document with the TTL defined in the TTL field. Like with following query, Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/
Re: Does shard splitting double host count
On 2/27/2015 7:15 AM, tuxedomoon wrote: I currently have a SolrCloud with 3 shards + replicas, it is holding 130M documents and the r3.large hosts are running out of memory. As it's on 4.2 there is no shard splitting, I will have to reindex to a 4.3+ version. If I had that feature would I need to split each shard into 2 subshards resulting in a total of 6 subshards, in order to keep all shards relatively equal? And since host memory is the problem I'd be migrating subshards to new hosts. So it seems I'd be going from 6 hosts to 12. Are these assumptions correct or is there a way to avoid doubling my host count? All shards that result from a split will reside on the same host(s) as the original shard. If you are splitting shards because of memory problems, it is normally a good idea to add hosts and then use ADDREPLICA and DELETEREPLICA to move your shard replicas around ... but that's not strictly required. You may not need a strict doubling of hosts ... adding 1 or 2 may be enough. Because it is a lot cleaner, I recommend building a new collection and reindexing to change the number of shards and hosts. You should be able to use your existing collection without interruption until you're ready to switch ... and if you do not want to reconfigure your application, you can delete the old collection and set up an alias that points the original collection name to the new collection. Coordinating index updates to make sure the new collection is completely up to date can be challenging. If you are having memory problems, be prepared for those memory problems to get at least a little bit worse (and maybe a lot worse) while splitting shards or building a new collection. Thanks, Shawn
Does shard splitting double host count
I currently have a SolrCloud with 3 shards + replicas, it is holding 130M documents and the r3.large hosts are running out of memory. As it's on 4.2 there is no shard splitting, I will have to reindex to a 4.3+ version. If I had that feature would I need to split each shard into 2 subshards resulting in a total of 6 subshards, in order to keep all shards relatively equal? And since host memory is the problem I'd be migrating subshards to new hosts. So it seems I'd be going from 6 hosts to 12. Are these assumptions correct or is there a way to avoid doubling my host count? -- View this message in context: http://lucene.472066.n3.nabble.com/Does-shard-splitting-double-host-count-tp4189595.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can't index all docs in a local folder with DIH in Solr 5.0.0
Alex, I've created JIRA ticket: https://issues.apache.org/jira/browse/SOLR-7174 In response to your suggestions below: 1. No exceptions are reported, even with onError removed. 2. ProcessMonitor shows only the very first epub file is being read (repeatedly) 3. I can repeat this on Ubuntu (14.04) by following the same steps. 4. Ticket raised (https://issues.apache.org/jira/browse/SOLR-7174) Additionally (and I've added this on the ticket), if I change the dataConfig to use FileDataSource and PlainTextEntityProcessor, and just list *.txt files, it works! dataConfig dataSource type=FileDataSource name=bin / document entity name=files dataSource=null rootEntity=false processor=FileListEntityProcessor baseDir=c:/Users/gt/Documents/HackerMonthly/epub fileName=.*txt field column=fileAbsolutePath name=id / field column=fileSize name=size / field column=fileLastModified name=lastModified / entity name=documentImport processor=PlainTextEntityProcessor url=${files.fileAbsolutePath} format=text dataSource=bin field column=plainText name=content/ /entity /entity /document /dataConfig So it's something related to BinFileDataSource and TikaEntityProcessor. Thanks, Gary. On 26/02/2015 14:24, Gary Taylor wrote: Alex, That's great. Thanks for the pointers. I'll try and get more info on this and file a JIRA issue. Kind regards, Gary. On 26/02/2015 14:16, Alexandre Rafalovitch wrote: On 26 February 2015 at 08:32, Gary Taylor g...@inovem.com wrote: Alex, Same results on recursive=true / recursive=false. I also tried importing plain text files instead of epub (still using TikeEntityProcessor though) and get exactly the same result - ie. all files fetched, but only one document indexed in Solr. To me, this would indicate that something is a problem with the inner DIH entity then. As a next set of steps, I would probably 1) remove both onError statements and see if there is an exception that is being swallowed. 2) run the import under ProcessMonitor and see if the other files are actually being read https://technet.microsoft.com/en-us/library/bb896645.aspx 3) Assume a Windows bug and test this on Mac/Linux 4) File a JIRA with a replication case. If there is a full replication setup, I'll test it machines I have access to with full debugger step-through For example, I wonder if FileBinDataSource is somehow not cleaning up after the first file properly on Windows and fails to open the second one. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ -- Gary Taylor | www.inovem.com | www.kahootz.com INOVEM Ltd is registered in England and Wales No 4228932 Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE kahootz.com is a trading name of INOVEM Ltd.
Solr highlighting of multiple terms, what is the separator of the string that is returned
http://stackoverflow.com/questions/4014820/solr-highlighting-of-multiple-terms tells us how to have multiple snippets be returned containing highlighted searched terms.. my question is: What is the separator of the string that is returned? I'm seeing it as a carriage return which isn't very helpful as the snippets themselves contain these - can we specify the separator does anyone know? Thank you -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-highlighting-of-multiple-terms-what-is-the-separator-of-the-string-that-is-returned-tp4189572.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Are we still using jetty with solr 5?
Yes, but these are now implementation details and may change in 5.x version (as opposed to waiting for 6.0). So, if you are troubleshooting, it is Jetty underneath with a war file. But from the architectural point of view, it is now a black box. So, Tomcat deployments are officially no longer supported. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 27 February 2015 at 05:24, Aman Tandon amantandon...@gmail.com wrote: Hi, I am trying to understand the new verson of solr 5 and when I was trying to stop the solr instance with the command *bin/solr stop -p 8983*, I found this message, *Sending stop command to Solr running on port 8983 ... waiting 5 seconds to allow Jetty process 7028 to stop gracefully.* So are we still using the jetty? Are we still dependent on war file? With Regards Aman Tandon
deploying solr5 in Jetty9 - error No context on this server matched or handled this request. ?
I'm trying a 1st deploy of Solr 5.0.0 in Jetty9 (jetty-distribution-9.2.9.v20150224). I've installed Jetty9 /etc/init.d/jetty check Checking arguments to Jetty: START_INI = /usr/local/etc/jetty/base/start.ini START_D= /usr/local/etc/jetty/base/start.d JETTY_HOME = /usr/local/jetty JETTY_BASE = /usr/local/etc/jetty/base JETTY_CONF = /usr/local/jetty/etc/jetty.conf JETTY_PID = /usr/local/etc/jetty/run/jetty.pid JETTY_START= /usr/local/jetty/start.jar JETTY_LOGS = /var/log/jetty JETTY_STATE= /usr/local/etc/jetty/run/jetty.state CLASSPATH = JAVA = /usr/lib64/jvm/java-openjdk/bin/java JAVA_OPTIONS = -Djetty.logs=/var/log/jetty -Djetty.home=/usr/local/jetty -Djetty.base=/usr/local/etc/jetty/base -Djava.io.tmpdir=/tmp JETTY_ARGS = jetty.state=/usr/local/etc/jetty/run/jetty.state jetty-logging.xml jetty-started.xml RUN_CMD= /usr/lib64/jvm/java-openjdk/bin/java -Djetty.logs=/var/log/jetty -Djetty.home=/usr/local/jetty -Djetty.base=/usr/local/etc/jetty/base -Djava.io.tmpdir=/tmp -jar /usr/local/jetty/start.jar jetty.state=/usr/local/etc/jetty/run/jetty.state jetty-logging.xml jetty-started.xml Jetty running pid=2444 It's running ps ax | grep jetty 2444 ?Sl 0:02 /usr/lib64/jvm/java-openjdk/bin/java -Djetty.logs=/var/log/jetty -Djetty.home=/usr/local/jetty -Djetty.base=/usr/local/etc/jetty/base -Djava.io.tmpdir=/tmp -jar /usr/local/jetty/start.jar jetty.state=/usr/local/etc/jetty/run/jetty.state jetty-logging.xml jetty-started.xml start-log-file=/var/log/jetty/start.log 3276 pts/1S+ 0:00 grep --color=auto jetty I've setup deployment in cat /usr/local/etc/jetty/jetty-deploy.xml ?xml version=1.0? !DOCTYPE Configure PUBLIC -//Jetty//Configure//EN http://www.eclipse.org/jetty/configure_9_0.dtd; Configure id=Server class=org.eclipse.jetty.server.Server Call name=addBean Arg New id=DeploymentManager class=org.eclipse.jetty.deploy.DeploymentManager Set name=contexts Ref refid=Contexts / /Set Call name=setContextAttribute Argorg.eclipse.jetty.server.webapp.ContainerIncludeJarPattern/Arg Arg.*/servlet-api-[^/]*\.jar$/Arg /Call Call id=webappprovider name=addAppProvider Arg New class=org.eclipse.jetty.deploy.providers.WebAppProvider Set name=monitoredDirName/home/hanl/jetty_webapps/Set Set name=defaultsDescriptorProperty name=jetty.home default=. //etc/webdefault.xml/Set Set name=scanInterval1/Set Set name=extractWarstrue/Set Set name=configurationManager New class=org.eclipse.jetty.deploy.PropertiesConfigurationManager / /Set /New /Arg /Call /New /Arg /Call /Configure where I've extracted the solr.war to tree -d /home/hanl/jetty_webapps /home/hanl/jetty_webapps └── [jetty 4096] solr ├── [jetty 4096] css │ └── [jetty 4096] styles ├── [jetty 4096] img │ ├── [jetty 4096] filetypes │ └── [jetty 4096] ico ├── [jetty 4096] js │ ├── [jetty 4096] lib │ └── [jetty 4096] scripts ├── [jetty 4096] META-INF ├── [jetty 4096] tpl └── [jetty 4096] WEB-INF └── [jetty 4096] lib 13 directories When I re-start jetty and nav to http://127.0.0.1:8080 I apparently don't find the solr app Error 404 - Not Found. No context on this server matched or handled this request. Contexts known to this server are: Powered by Jetty:// Java Web Server I've obviously misconfigured something. Appreciate any help figuring out what! hanlon
Re: Does shard splitting double host count
What about adding one new leader/replica pair? It seems that would entail a) creating the r3.large instances and volumes b) adding 2 new Zookeeper hosts? c) updating my Zookeeper configs (new hosts, new ids, new SOLR config) d) restarting all ZKs e) restarting SOLR hosts in sequence needed for correct shard/replica assignment f) start indexing again So shards 1,2,3 start with 33% of the docs each. As I start indexing new documents get sharded at 25% per shard. If I reindex a document that exists already in shard2, does it remain in shard2 or could it migrate to another shard, thus removing it from shard2. I'm looking for a migration strategy to achieve 25% docs per shard. I would also consider deleting docs by daterange from shards1,2,3 and reindexing them to redistribute evenly. -- View this message in context: http://lucene.472066.n3.nabble.com/Does-shard-splitting-double-host-count-tp4189595p4189672.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [ANNOUNCE] Luke 4.10.3 released
Hi Dmitry, In my environment, I cannot produce this pivots's error in HotSpot VM 1.7.0, please give me some time... Or, I'll try to make pull requests https://github.com/DmitryKey/luke for pivots's version. At any rate, it would be best to manage both of (current) thinlet's and pivots's versions at same place, as you suggested. Thanks, Tomoko 2015-02-26 22:15 GMT+09:00 Dmitry Kan solrexp...@gmail.com: Sure, it is: java version 1.7.0_76 Java(TM) SE Runtime Environment (build 1.7.0_76-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.76-b04, mixed mode) On Thu, Feb 26, 2015 at 2:39 PM, Tomoko Uchida tomoko.uchida.1...@gmail.com wrote: Sorry, I'm afraid I have not encountered such errors when launch. Seems something wrong around Pivot's, but I have no idea about it. Would you tell me java version you're using ? Tomoko 2015-02-26 21:15 GMT+09:00 Dmitry Kan solrexp...@gmail.com: Thanks, Tomoko, it compiles ok! Now launching produces some errors: $ java -cp dist/* org.apache.lucene.luke.ui.LukeApplication Exception in thread main java.lang.ExceptionInInitializerError at org.apache.lucene.luke.ui.LukeApplication.main(Unknown Source) Caused by: java.lang.NumberFormatException: For input string: 3 1644336 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:492) at java.lang.Byte.parseByte(Byte.java:148) at java.lang.Byte.parseByte(Byte.java:174) at org.apache.pivot.util.Version.decode(Version.java:156) at org.apache.pivot.wtk.ApplicationContext.clinit(ApplicationContext.java:1704) ... 1 more On Thu, Feb 26, 2015 at 1:48 PM, Tomoko Uchida tomoko.uchida.1...@gmail.com wrote: Thank you for checking out it! Sorry, I've forgot to note important information... ivy jar is needed to compile. Packaging process needs to be organized, but for now, I'm borrowing it from lucene's tools/lib. In my environment, Fedora 20 and OpenJDK 1.7.0_71, it can be compiled and run as follows. If there are any problems, please let me know. $ svn co http://svn.apache.org/repos/asf/lucene/sandbox/luke/ $ cd luke/ // copy ivy jar to lib/tools $ cp /path/to/lucene_solr_4_10_3/lucene/tools/lib/ivy-2.3.0.jar lib/tools/ $ ls lib/tools/ ivy-2.3.0.jar $ java -version java version 1.7.0_71 OpenJDK Runtime Environment (fedora-2.5.3.3.fc20-x86_64 u71-b14) OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode) $ ant ivy-resolve ... BUILD SUCCESSFUL // compile and make jars and run $ ant dist ... BUILD SUCCESSFULL $ java -cp dist/* org.apache.lucene.luke.ui.LukeApplication ... Thanks, Tomoko 2015-02-26 16:39 GMT+09:00 Dmitry Kan solrexp...@gmail.com: Hi Tomoko, Thanks for the link. Do you have build instructions somewhere? When I executed ant with no params, I get: BUILD FAILED /home/dmitry/projects/svn/luke/build.xml:40: /home/dmitry/projects/svn/luke/lib-ivy does not exist. On Thu, Feb 26, 2015 at 2:27 AM, Tomoko Uchida tomoko.uchida.1...@gmail.com wrote: Thanks! Would you announce at LUCENE-2562 to me and all watchers interested in this issue, when the branch is ready? :) As you know, current pivots's version (that supports Lucene 4.10.3) is here. http://svn.apache.org/repos/asf/lucene/sandbox/luke/ Regards, Tomoko 2015-02-25 18:37 GMT+09:00 Dmitry Kan solrexp...@gmail.com: Ok, sure. The plan is to make the pivot branch in the current github repo and update its structure accordingly. Once it is there, I'll let you know. Thank you, Dmitry On Tue, Feb 24, 2015 at 5:26 PM, Tomoko Uchida tomoko.uchida.1...@gmail.com wrote: Hi Dmitry, Thank you for the detailed clarification! Recently, I've created a few patches to Pivot version(LUCENE-2562), so I'd like to some more work and keep up to date it. If you would like to work on the Pivot version, may I suggest you to fork the github's version? The ultimate goal is to donate this to Apache, but at least we will have the common plate. :) Yes, I love to the idea about having common code base. I've looked at both codes of github's (thinlet's) and Pivot's, Pivot's version has very different structure from github's (I think that is mainly for UI framework's requirement.) So it seems to be difficult to directly fork github's version
RE: Does shard splitting double host count
You can't just add a new core to an existing collection. You can add the new node to the cloud, but it won't be part of any collection. You're not going to be able to just slide it in as a 4th shard to an established collection of 3 shards. The root of that comes from routing (I'll assume you use default routing, rather than any custom routing). When you index a document into the cloud, it gets a unique id number attached to it. If you have 3 shards, than each shard gets 1/3 of the range of those possible ids. Inserts and/or updates for the same document will have the same id and be routed to the same shard. Shard splitting just divides the range of the shard in half, and copies documents to the 2 new shards based upon where their id's now fall in the new range. That's a little easier to manage than the more complex process of adding one shard, then having to adjust the ranges on all the other shards, and then copy entries that have to move -- all the while ensuring that new adds/updates/deletes are being routed to the correct location based upon whether the original has been copied over to the new ranges or not, yada, yada, yada. I believe there's been some discussions about how to add a capability like that to solr (i.e. adjust shard ranges and have documents moved and handled correctly), but I don't think it's even in 5.0. Now, if you feel the need to go down this path of adding a single shard to a 3 shard collection, here's something similar. Add your new solr node to the cloud. Then create a 1 shard, 2 replica collection called collectionPart2. Also add a query alias for TotalCollection that points to collectionPart1, collectionPart2. That way a query will get processed by all 4 of your shards. Now this will make indexing more difficult, because you'll have to send your new documents to collectionPart2 until that collection's shard gets about as big as the shards on your 3 shard collection. But some source data can be split up like that fairly easily, especially sequential data source. For example, if indexing twitter or email feeds, you can create new collection with appropriate shard/replica configuration and feed in a day (or month, or whatever) of data. Then repeat with a new collection for the next set. Keep the query alias updated to span the collections you're interested in. -Original Message- From: tuxedomoon [mailto:dancolem...@yahoo.com] Sent: Friday, February 27, 2015 12:43 PM To: solr-user@lucene.apache.org Subject: Re: Does shard splitting double host count What about adding one new leader/replica pair? It seems that would entail a) creating the r3.large instances and volumes b) adding 2 new Zookeeper hosts? c) updating my Zookeeper configs (new hosts, new ids, new SOLR config) d) restarting all ZKs e) restarting SOLR hosts in sequence needed for correct shard/replica assignment f) start indexing again So shards 1,2,3 start with 33% of the docs each. As I start indexing new documents get sharded at 25% per shard. If I reindex a document that exists already in shard2, does it remain in shard2 or could it migrate to another shard, thus removing it from shard2. I'm looking for a migration strategy to achieve 25% docs per shard. I would also consider deleting docs by daterange from shards1,2,3 and reindexing them to redistribute evenly. -- View this message in context: http://lucene.472066.n3.nabble.com/Does-shard-splitting-double-host-count-tp4189595p4189672.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to debug solr performance degradation
On 2/27/2015 12:51 PM, Tang, Rebecca wrote: Thank you guys for all the suggestions and help! I'Ve identified the main culprit with debug=timing. It was the mlt component. After I removed it, the speed of the query went back to reasonable. Another culprit is the expand component, but I can't remove it. We've downgraded our amazon instance to 60G mem with general purpose SSD and the performance is pretty good. It's only 70 cents/hr versus 2.80/hr for the 244G mem instance :) I also added all the suggested JMV parameters. Now I have a gc.log that I dig into. One thing I would like to understand is how memory is managed by solr. If I do 'top -u solr', I see something like this: Mem: 62920240k total, 62582524k used, 337716k free, 133360k buffers Swap:0k total,0k used,0k free, 54500892k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 4266 solr 20 0 192g 5.1g 854m S 0.0 8.4 37:09.97 java There are two things: 1) Mem: 62920240k total, 62582524k used. I think this is what the solr admin physical memory bar graph reports on. Can I assume that most of the mem is used for loading part of the index? 2) And then there's the VIRT 192g and RES 5.1g. What is the 5.1 RES (physical memory) that is used by solr? The total and used values from top refer to *all* memory in the entire machine, and it does match the physical memory graph in the admin UI. If you notice that the cached value is 54GB, that's where most of the memory usage is actually happening. This is the OS disk cache -- the OS is automatically using extra memory to cache data on the disk. You are only caching about a third of your index, which may not be enough for good performance, especially with complex queries. The VIRT (virtual) and RES (resident) values are describing how Java is using memory from the OS point of view. The java process has allocated 5.1GB of RAM for the heap and all other memory structures. The VIRT number is the total amount of *address space* (virtual memory, not actual memory) that the process has allocated. For Solr, this will typically be (approximately) the size of all your indexes plus the RES and SHR values. Solr (Lucene) uses the mmap functionality in the operating system for all disk access by default (configurable) -- this means that it maps the file on the disk into virtual memory. This makes it so that a program doesn't need to use disk I/O calls to access the data ... it just pretends that the file is sitting in memory. The operating system takes care of translating those memory reads and writes into disk access. All memory that is not explicitly allocated to a program is automatically used to cache that disk access -- this is the cached number from top that I already mentioned. http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html http://en.wikipedia.org/wiki/Page_cache Thanks, Shawn
Re: Does shard splitting double host count
I'd forgotten that DzkHost refers to the Zookeeper hosts not SOLR hosts. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Does-shard-splitting-double-host-count-tp4189595p4189703.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to debug solr performance degradation
Thank you guys for all the suggestions and help! I'Ve identified the main culprit with debug=timing. It was the mlt component. After I removed it, the speed of the query went back to reasonable. Another culprit is the expand component, but I can't remove it. We've downgraded our amazon instance to 60G mem with general purpose SSD and the performance is pretty good. It's only 70 cents/hr versus 2.80/hr for the 244G mem instance :) I also added all the suggested JMV parameters. Now I have a gc.log that I dig into. One thing I would like to understand is how memory is managed by solr. If I do 'top -u solr', I see something like this: Mem: 62920240k total, 62582524k used, 337716k free, 133360k buffers Swap:0k total,0k used,0k free, 54500892k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 4266 solr 20 0 192g 5.1g 854m S 0.0 8.4 37:09.97 java There are two things: 1) Mem: 62920240k total, 62582524k used. I think this is what the solr admin physical memory bar graph reports on. Can I assume that most of the mem is used for loading part of the index? 2) And then there's the VIRT 192g and RES 5.1g. What is the 5.1 RES (physical memory) that is used by solr? Rebecca Tang Applications Developer, UCSF CKM Industry Documents Digital Libraries E: rebecca.t...@ucsf.edu On 2/25/15 7:57 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Lots of suggestions here already. +1 for those JVM params from Boogie and for looking at JMX. Rebecca, try SPM http://sematext.com/spm (will look at JMX for you, among other things), it may save you time figuring out JVM/heap/memory/performance issues. If you can't tell what's slow via SPM, we can have a look at your metrics (charts are sharable) and may be able to help you faster than guessing. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr Elasticsearch Support * http://sematext.com/ On Wed, Feb 25, 2015 at 4:27 PM, Erick Erickson erickerick...@gmail.com wrote: Before diving in too deeply, try attaching debug=timing to the query. Near the bottom of the response there'll be a list of the time taken by each _component_. So there'll be separate entries for query, highlighting, etc. This may not show any surprises, you might be spending all your time scoring. But it's worth doing as a check and might save you from going down some dead-ends. I mean if your query winds up spending 80% of its time in the highlighter you know where to start looking.. Best, Erick On Wed, Feb 25, 2015 at 12:01 PM, Boogie Shafer boogie.sha...@proquest.com wrote: rebecca, you probably need to dig into your queries, but if you want to force/preload the index into memory you could try doing something like cat `find /path/to/solr/index` /dev/null if you haven't already reviewed the following, you might take a look here https://wiki.apache.org/solr/SolrPerformanceProblems perhaps going back to a very vanilla/default solr configuration and building back up from that baseline to better isolate what might specific setting be impacting your environment From: Tang, Rebecca rebecca.t...@ucsf.edu Sent: Wednesday, February 25, 2015 11:44 To: solr-user@lucene.apache.org Subject: RE: how to debug solr performance degradation Sorry, I should have been more specific. I was referring to the solr admin UI page. Today we started up an AWS instance with 240 G of memory to see if we fit all of our index (183G) in the memory and have enough for the JMV, could it improve the performance. I attached the admin UI screen shot with the email. The top bar is ³Physical Memory² and we have 240.24 GB, but only 4% 9.52 GB is used. The next bar is Swap Space and it¹s at 0.00 MB. The bottom bar is JVM Memory which is at 2.67 GB and the max is 26G. My understanding is that when Solr starts up, it reserves some memory for the JVM, and then it tries to use up as much of the remaining physical memory as possible. And I used to see the physical memory at anywhere between 70% to 90+%. Is this understanding correct? And now, even with 240G of memory, our index is performing at 10 - 20 seconds for a query. Granted that our queries have fq¹s and highlighting and faceting, I think with a machine this powerful I should be able to get the queries executed under 5 seconds. This is what we send to Solr: q=(phillip%20morris) wt=json start=0 rows=50 facet=true facet.mincount=0 facet.pivot=industry,collection_facet facet.pivot=availability_facet,availabilitystatus_facet facet.field=dddate fq%3DNOT(pg%3A1%20AND%20(dt%3A%22blank%20document%22%20OR%20dt%3A%22blan k% 20page%22%20OR%20dt%3A%22file%20folder%22%20OR%20dt%3A%22file%20folder%20 be gin%22%20OR%20dt%3A%22file%20folder%20cover%22%20OR%20dt%3A%22file%20fold er
Re: Does shard splitting double host count
On 2/27/2015 11:42 AM, tuxedomoon wrote: What about adding one new leader/replica pair? It seems that would entail a) creating the r3.large instances and volumes b) adding 2 new Zookeeper hosts? c) updating my Zookeeper configs (new hosts, new ids, new SOLR config) d) restarting all ZKs e) restarting SOLR hosts in sequence needed for correct shard/replica assignment f) start indexing again You do not need additional zookeeper hosts to run more Solr hosts. Three hosts are all that is required for a fully redundant ZK ensemble, no matter how many Solr hosts are in your cloud. I'm not sure what you're gaining by restarting the existing Solr hosts, either. If you want to add more Solr hosts, just start them up with the correct parameter (-DzkHost, etc) so they register themselves with zookeeper. They will immediately be available for replica migrations or anything else you might want to do. Thanks, Shawn
Unstemmed searching
Several months ago Tom-Burton West asked: The Solr wiki says A repeated question is how can I have the original term contribute more to the score than the stemmed version? In Solr 4.3, the KeywordRepeatFilterFactory has been added to assist this functionality. https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming (Full section reproduced below.) I can see how in the example from the wiki reproduced below that both the stemmed and original term get indexed, but I don't see how the original term gets more weight than the stemmed term. Wouldn't this require a filter that gives terms with the keyword attribute more weight? What am I missing? Tom I've read the follow-ups to that message, and have used the KeywordRepeatFilterFactory in the analyzer chain for both index and query as follows: tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ICUFoldingFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordRepeatFilterFactory/ filter class=solr.SnowballPorterFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ And although this may be giving some amount of boost to the unstemmed form, our users are still asking for the ability to specify that stemming is turned off altogether. I know that this can be done by copying every field to an unstemmed version of that field, but it seems that with the KeywordRepeatFilter already in play, that there should be _something_ that can be done to disable stemming dynamically at query time without needing to copy all the fields and re-index everything. So that is X and possible Y's that might accomplish this that I've thought of are: 1) Allow Dummy Snowball filter at query time * Create org.tartarus.snowball.ext.DummyStemmer which does no stemming at all. * Add a checkbox to the interface to allow the user to select unstemmed searching * Devise a way for a parameter specified with the query to be passed through to the filter class=solr.SnowballPorterFilterFactory / as the language to use * Use either English or Dummy to perform either stemmed searching or unstemmed searching. 2) Consult the keyword attribute perhaps in a function query Any thoughts on either of these ideas, of different approaches to solve the problem. thanks in advance Robert Haschart
Suggestion on indexing complex xml
Hi, I am able to index XML with same name element but in different XPATH by using XPathEntityProcessor forEach (e.g. below) Just wondering if there is better way to handle this xml format. a) Is there any better way to handle this scenario as xml file will have multiple sub-menu attributes (e.g. A, B, C, D...) and I will have to specify each in forEach attribute. b) How to differentiate xml result from two entities defined in data-config.xml Example xml menu A nameWaffles/name price$2.95/price /A B nameStrawberry/name description Light waffles covered with strawberries /description price$3.95/price /B /menu Example dataConfig dataConfig dataSource type=FileDataSource encoding=UTF-8 / document entity name=actions processor=XPathEntityProcessor stream=true forEach=/menu/A | /menu/B url=C:/tmp/menu.xml transformer=RegexTransformer,DateFormatTransformer field column=name xpath=/menu/A/name / field column=price xpath=/menu/A/price / field column=name xpath=/menu/B/name / field column=description xpath=/menu/B/description / field column=price xpath=/menu/B/price / /entity /document /dataConfig Result : docs: [ { name: Waffles, value: $2.95 }, { description: Light waffles covered with strawberries, name: Strawberry, value: $3.95 } ]
Re: Encrypt Data in SOLR
Don't store it? stored=false, indexed=true You may need bit more details really. There is no encryption, if you encrypt it, you cannot search it. So, you should concentrate on security of access instead and/or full-disk encryption (at the cost to performance) Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 27 February 2015 at 16:31, EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS) external.ravi.tamin...@us.bosch.com wrote: HI, Do Any one know how to encrypt the Solr Data Stored ? e.g. I have to do a search on the address for the customer, but the data should be not able to read by naked eyes? Thanks Ravi
Re: Suggestion on indexing complex xml
On 27 February 2015 at 16:11, Vishal Swaroop vishal@gmail.com wrote: I am able to index XML with same name element but in different XPATH by using XPathEntityProcessor forEach (e.g. below) Just wondering if there is better way to handle this xml format. DIH's XML parser is rather limited and literally-minded. You could instead pre-process XML with XSLT: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-UsingXSLTtoTransformXMLIndexUpdates Or looking into something like SIREn: http://siren.solutions/siren/overview/ Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/
Encrypt Data in SOLR
HI, Do Any one know how to encrypt the Solr Data Stored ? e.g. I have to do a search on the address for the customer, but the data should be not able to read by naked eyes? Thanks Ravi
RE: Unstemmed searching
Hello Robert. Unstemmed terms have slightly higher IDF so they gain more weight, but stemmed tokens usually have slightly higher TF, so differences are marginal at best, especially when using standard TFIDFSimilarity. However, by setting a payload for stemmed terms, you can recognize them at search time and give them a lower score. You need a custom similarity when dealing with payloads so it is possible to tune the weight without reindexing. MArkus -Original message- From:Robert Haschart rh...@virginia.edu Sent: Friday 27th February 2015 22:01 To: solr-user@lucene.apache.org Subject: Unstemmed searching Several months ago Tom-Burton West asked: The Solr wiki says A repeated question is how can I have the original term contribute more to the score than the stemmed version? In Solr 4.3, the KeywordRepeatFilterFactory has been added to assist this functionality. https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming (Full section reproduced below.) I can see how in the example from the wiki reproduced below that both the stemmed and original term get indexed, but I don't see how the original term gets more weight than the stemmed term. Wouldn't this require a filter that gives terms with the keyword attribute more weight? What am I missing? Tom I've read the follow-ups to that message, and have used the KeywordRepeatFilterFactory in the analyzer chain for both index and query as follows: tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ICUFoldingFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordRepeatFilterFactory/ filter class=solr.SnowballPorterFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ And although this may be giving some amount of boost to the unstemmed form, our users are still asking for the ability to specify that stemming is turned off altogether. I know that this can be done by copying every field to an unstemmed version of that field, but it seems that with the KeywordRepeatFilter already in play, that there should be _something_ that can be done to disable stemming dynamically at query time without needing to copy all the fields and re-index everything. So that is X and possible Y's that might accomplish this that I've thought of are: 1) Allow Dummy Snowball filter at query time * Create org.tartarus.snowball.ext.DummyStemmer which does no stemming at all. * Add a checkbox to the interface to allow the user to select unstemmed searching * Devise a way for a parameter specified with the query to be passed through to the filter class=solr.SnowballPorterFilterFactory / as the language to use * Use either English or Dummy to perform either stemmed searching or unstemmed searching. 2) Consult the keyword attribute perhaps in a function query Any thoughts on either of these ideas, of different approaches to solve the problem. thanks in advance Robert Haschart
Log numfound, qtime, ...
Hello everyone, Here's my need : I'd like to log Solr Responses so as to achieve some business statistics. I'd like to report, as a daily/weekly/yearly/whateverly basis, the following KPIs : - Most popular requests (hits) - Average numfound for each request - Average response time for each request - Requests that have returned an error - Request that have a numfound of 0. The idea is to give the searchandizer the keys to analyze and enhance in real-time the relevancy of his data. I think it's not the job of a developer to detect that the keyword TV never has results because Television is the referring word in the whole catalog, for instance. The searchandizer should analyze this at anytime and provide the correct synonyms to improve relevance. I'm using Solr with PHP and the Solarium library. Actually the only way I found to manage this, is the following way : 1. The user sends the request 2. Nginx intercepts the request, and forwards it to a PHP app 3. The PHP app loads the Solarium library and forwards the request to Solr/Jetty 4. Solr replies a JSON and Solarium turns it into a PHP Solarium Response Object 5. The PHP app sends the user the raw JSON through NGINX (as if it were Jetty) 6. The PHP app stores the query, the QTime and the numfound in a database I think I'll soon get into performance issues, as you guess. Do you know a better approach ? Thanks, Ben -- View this message in context: http://lucene.472066.n3.nabble.com/Log-numfound-qtime-tp4189561.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: deploying solr5 in Jetty9 - error No context on this server matched or handled this request. ?
Thanks to all for the info. I'd made the mistake of staring with 3rd party tutorials which seem to miss some of the salient details!. RTFM'ing the official stuff now ... I would strongly recommend that you simply run Solr 5 with the jetty server (and the bin/solr script) that's included in the binary download. At some point in the future, you will not be able to deploy Solr in your own servlet container -- it will be a standalone application, NOT a war. Is that true? I didn't understand that yet from what I've read. I'm moving from a tomcat8 setup. The only reason I ever needed tomcat was to run Solr. The issue's the same here -- I don't need jetty for anything else (atm). I *DO* need to end up with a multi-core (not multi-instance, iiuc) setup. If that's doable with a 'standalone' solr5, then I think that may be the wise advice here. hanlon
Re: deploying solr5 in Jetty9 - error No context on this server matched or handled this request. ?
On 2/27/2015 4:29 PM, Shawn Heisey wrote: On 2/27/2015 10:59 AM, h15...@mailas.com wrote: I'm trying a 1st deploy of Solr 5.0.0 in Jetty9 (jetty-distribution-9.2.9.v20150224). snip I've obviously misconfigured something. Appreciate any help figuring out what! Followup: I was able to get Solr 5.0.0 started under the jetty9 distribution you mentioned. I unpacked the jetty archive, changed into the unpacked directory, and did these extremely simple and fast steps: * Copied solr.war from solr-5.0.0/server/webapps (in the Solr download) into webapps. * Copied logging jars from solr-5.0.0/server/lib/ext (in the Solr download) into lib/ext. * Created a solr directory. * Created solr/solr.xml with one line: solr/ Once I did these things, running java -jar start.jar worked, and browsing to http://server:8080/solr brought up the admin UI (with no cores). If you do not know how to take this information and adapt it for your install, then you should stick to running Solr with the scripts and jetty included in the Solr download. Thanks, Shawn
Re: Unstemmed searching
Passing query params down into analysis chain has been discussed before but I think it is a bit controversial/complex. How about a more high-level approach to be able to change query analyzer, e.g. [f.field.]q.analyzer=analyzer|fieldType Then query parsers would use the specified analyzer for a field instead of the schema-defined one. About your Dummy language, it would avoid stemming, but would not avoid false matches against stemmed words that accidentially match the query word. Example: books gets stemmed as books,book. You search for q=book a ticketlang=dummy, and still get a match on the books document. Or is there a way to affect whether a token matches or not based on its payload? A common workaround is be to use a customized stemmer which prefixes all stemmed terms with a special unicode character, so you can totally avoid them if you need to. We discuss the option of deboosting certain token types (stems, synonyms etc) in https://issues.apache.org/jira/browse/LUCENE-3130 but that issue never resulted in anything. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 27. feb. 2015 kl. 22.13 skrev Markus Jelsma markus.jel...@openindex.io: Hello Robert. Unstemmed terms have slightly higher IDF so they gain more weight, but stemmed tokens usually have slightly higher TF, so differences are marginal at best, especially when using standard TFIDFSimilarity. However, by setting a payload for stemmed terms, you can recognize them at search time and give them a lower score. You need a custom similarity when dealing with payloads so it is possible to tune the weight without reindexing. MArkus -Original message- From:Robert Haschart rh...@virginia.edu Sent: Friday 27th February 2015 22:01 To: solr-user@lucene.apache.org Subject: Unstemmed searching Several months ago Tom-Burton West asked: The Solr wiki says A repeated question is how can I have the original term contribute more to the score than the stemmed version? In Solr 4.3, the KeywordRepeatFilterFactory has been added to assist this functionality. https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming (Full section reproduced below.) I can see how in the example from the wiki reproduced below that both the stemmed and original term get indexed, but I don't see how the original term gets more weight than the stemmed term. Wouldn't this require a filter that gives terms with the keyword attribute more weight? What am I missing? Tom I've read the follow-ups to that message, and have used the KeywordRepeatFilterFactory in the analyzer chain for both index and query as follows: tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ICUFoldingFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordRepeatFilterFactory/ filter class=solr.SnowballPorterFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ And although this may be giving some amount of boost to the unstemmed form, our users are still asking for the ability to specify that stemming is turned off altogether. I know that this can be done by copying every field to an unstemmed version of that field, but it seems that with the KeywordRepeatFilter already in play, that there should be _something_ that can be done to disable stemming dynamically at query time without needing to copy all the fields and re-index everything. So that is X and possible Y's that might accomplish this that I've thought of are: 1) Allow Dummy Snowball filter at query time * Create org.tartarus.snowball.ext.DummyStemmer which does no stemming at all. * Add a checkbox to the interface to allow the user to select unstemmed searching * Devise a way for a parameter specified with the query to be passed through to the filter class=solr.SnowballPorterFilterFactory / as the language to use * Use either English or Dummy to perform either stemmed searching or unstemmed searching. 2) Consult the keyword attribute perhaps in a function query Any thoughts on either of these ideas, of different approaches to solve the problem. thanks in advance Robert Haschart
Re: deploying solr5 in Jetty9 - error No context on this server matched or handled this request. ?
Followup: I was able to get Solr 5.0.0 started under the jetty9 distribution you mentioned. I unpacked the jetty archive, changed into the unpacked directory, and did these extremely simple and fast steps: * Copied solr.war from solr-5.0.0/server/webapps (in the Solr download) into webapps. * Copied logging jars from solr-5.0.0/server/lib/ext (in the Solr download) into lib/ext. * Created a solr directory. * Created solr/solr.xml with one line: solr/ Once I did these things, running java -jar start.jar worked, and browsing to http://server:8080/solr brought up the admin UI (with no cores). If you do not know how to take this information and adapt it for your install, then you should stick to running Solr with the scripts and jetty included in the Solr download. That I've managed now. SPecificall cp'ing the solr webapps* into the jetty dist's ./webapps/, etc. What I've NOT managed to do is get jetty to see/use the solr webapp if it's installed into a different/other location. Yet.
Re: deploying solr5 in Jetty9 - error No context on this server matched or handled this request. ?
In 5.0 the new way is to not use a servlet container, just use the start/stop scripts. you should find a ...solr/bin/solr that you use to start/stop/whatever. You can still run with a normal servlet container, but is there a particular reason you need to? If not, just use the start/stop commends from the script. Best, Erick On Fri, Feb 27, 2015 at 9:59 AM, h15...@mailas.com wrote: I'm trying a 1st deploy of Solr 5.0.0 in Jetty9 (jetty-distribution-9.2.9.v20150224). I've installed Jetty9 /etc/init.d/jetty check Checking arguments to Jetty: START_INI = /usr/local/etc/jetty/base/start.ini START_D= /usr/local/etc/jetty/base/start.d JETTY_HOME = /usr/local/jetty JETTY_BASE = /usr/local/etc/jetty/base JETTY_CONF = /usr/local/jetty/etc/jetty.conf JETTY_PID = /usr/local/etc/jetty/run/jetty.pid JETTY_START= /usr/local/jetty/start.jar JETTY_LOGS = /var/log/jetty JETTY_STATE= /usr/local/etc/jetty/run/jetty.state CLASSPATH = JAVA = /usr/lib64/jvm/java-openjdk/bin/java JAVA_OPTIONS = -Djetty.logs=/var/log/jetty -Djetty.home=/usr/local/jetty -Djetty.base=/usr/local/etc/jetty/base -Djava.io.tmpdir=/tmp JETTY_ARGS = jetty.state=/usr/local/etc/jetty/run/jetty.state jetty-logging.xml jetty-started.xml RUN_CMD= /usr/lib64/jvm/java-openjdk/bin/java -Djetty.logs=/var/log/jetty -Djetty.home=/usr/local/jetty -Djetty.base=/usr/local/etc/jetty/base -Djava.io.tmpdir=/tmp -jar /usr/local/jetty/start.jar jetty.state=/usr/local/etc/jetty/run/jetty.state jetty-logging.xml jetty-started.xml Jetty running pid=2444 It's running ps ax | grep jetty 2444 ?Sl 0:02 /usr/lib64/jvm/java-openjdk/bin/java -Djetty.logs=/var/log/jetty -Djetty.home=/usr/local/jetty -Djetty.base=/usr/local/etc/jetty/base -Djava.io.tmpdir=/tmp -jar /usr/local/jetty/start.jar jetty.state=/usr/local/etc/jetty/run/jetty.state jetty-logging.xml jetty-started.xml start-log-file=/var/log/jetty/start.log 3276 pts/1S+ 0:00 grep --color=auto jetty I've setup deployment in cat /usr/local/etc/jetty/jetty-deploy.xml ?xml version=1.0? !DOCTYPE Configure PUBLIC -//Jetty//Configure//EN http://www.eclipse.org/jetty/configure_9_0.dtd; Configure id=Server class=org.eclipse.jetty.server.Server Call name=addBean Arg New id=DeploymentManager class=org.eclipse.jetty.deploy.DeploymentManager Set name=contexts Ref refid=Contexts / /Set Call name=setContextAttribute Argorg.eclipse.jetty.server.webapp.ContainerIncludeJarPattern/Arg Arg.*/servlet-api-[^/]*\.jar$/Arg /Call Call id=webappprovider name=addAppProvider Arg New class=org.eclipse.jetty.deploy.providers.WebAppProvider Set name=monitoredDirName/home/hanl/jetty_webapps/Set Set name=defaultsDescriptorProperty name=jetty.home default=. //etc/webdefault.xml/Set Set name=scanInterval1/Set Set name=extractWarstrue/Set Set name=configurationManager New class=org.eclipse.jetty.deploy.PropertiesConfigurationManager / /Set /New /Arg /Call /New /Arg /Call /Configure where I've extracted the solr.war to tree -d /home/hanl/jetty_webapps /home/hanl/jetty_webapps └── [jetty 4096] solr ├── [jetty 4096] css │ └── [jetty 4096] styles ├── [jetty 4096] img │ ├── [jetty 4096] filetypes │ └── [jetty 4096] ico ├── [jetty 4096] js │ ├── [jetty 4096] lib │ └── [jetty 4096] scripts ├── [jetty 4096] META-INF ├── [jetty 4096] tpl └── [jetty 4096] WEB-INF └── [jetty 4096] lib 13 directories When I re-start jetty and nav to http://127.0.0.1:8080 I apparently don't find the solr app Error 404 - Not Found. No context on this server matched or handled this request. Contexts known to this server are: Powered by Jetty:// Java Web Server I've obviously misconfigured something. Appreciate any help figuring out what! hanlon
Are we still using jetty with solr 5?
Hi, I am trying to understand the new verson of solr 5 and when I was trying to stop the solr instance with the command *bin/solr stop -p 8983*, I found this message, *Sending stop command to Solr running on port 8983 ... waiting 5 seconds to allow Jetty process 7028 to stop gracefully.* So are we still using the jetty? Are we still dependent on war file? With Regards Aman Tandon
Re: Encrypt Data in SOLR
You could simply hash the value before sending it to Solr and then hash the user query before sending it to Solr as well. Do you need or want only exact matches, or do you need keyword search, wildcards, etc? -- Jack Krupansky On Fri, Feb 27, 2015 at 4:38 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Don't store it? stored=false, indexed=true You may need bit more details really. There is no encryption, if you encrypt it, you cannot search it. So, you should concentrate on security of access instead and/or full-disk encryption (at the cost to performance) Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 27 February 2015 at 16:31, EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS) external.ravi.tamin...@us.bosch.com wrote: HI, Do Any one know how to encrypt the Solr Data Stored ? e.g. I have to do a search on the address for the customer, but the data should be not able to read by naked eyes? Thanks Ravi
Re: deploying solr5 in Jetty9 - error No context on this server matched or handled this request. ?
On 2/27/2015 10:59 AM, h15...@mailas.com wrote: I'm trying a 1st deploy of Solr 5.0.0 in Jetty9 (jetty-distribution-9.2.9.v20150224). snip I've obviously misconfigured something. Appreciate any help figuring out what! Add-on info to the replies you have already received: This may be harder to do than simply deploying the war. I hope you can find a way to make it work, I'm just letting you know that it may not be easy. Some time ago, I tried to upgrade the whole project to Jetty 9 and Solr wouldn't compile. Somebody else took over the issue and we are now running Jetty 9 in trunk, but 5.x is still using Jetty 8. https://issues.apache.org/jira/browse/SOLR-4839 Looking at the 2015/01/05 patch (which I believe is the one that actually got committed, along with the very small 2015/01/07 patch), it looks like a very significant amount of work was required within the Solr codebase. I tried to replace the jars in my own custom install with jetty 9, and I can't get jetty to start at all -- the jetty config has radically changed and I do not know how to adapt it. I would strongly recommend that you simply run Solr 5 with the jetty server (and the bin/solr script) that's included in the binary download. At some point in the future, you will not be able to deploy Solr in your own servlet container -- it will be a standalone application, NOT a war. Thanks, Shawn
Re: deploying solr5 in Jetty9 - error No context on this server matched or handled this request. ?
: In 5.0 the new way is to not use a servlet container, just use the : start/stop scripts. More specifically... https://cwiki.apache.org/confluence/display/solr/Major+Changes+from+Solr+4+to+Solr+5 Internally, Solr is still implemented via Servlet APIs and is powered by Jetty -- but this is simply an implementation detail. Deployment as a webapp to other Servlet Containers (or other instances of Jetty) is not supported, and may not work in future 5.x versions of Solr when additional changes are likely to be made to Solr internally to leverage custom networking stack features. -Hoss http://www.lucidworks.com/
Re: Log numfound, qtime, ...
Did you check Kibana/Banana ? On Fri, Feb 27, 2015 at 2:07 PM, bengates benga...@aliceadsl.fr wrote: Hello everyone, Here's my need : I'd like to log Solr Responses so as to achieve some business statistics. I'd like to report, as a daily/weekly/yearly/whateverly basis, the following KPIs : - Most popular requests (hits) - Average numfound for each request - Average response time for each request - Requests that have returned an error - Request that have a numfound of 0. The idea is to give the searchandizer the keys to analyze and enhance in real-time the relevancy of his data. I think it's not the job of a developer to detect that the keyword TV never has results because Television is the referring word in the whole catalog, for instance. The searchandizer should analyze this at anytime and provide the correct synonyms to improve relevance. I'm using Solr with PHP and the Solarium library. Actually the only way I found to manage this, is the following way : 1. The user sends the request 2. Nginx intercepts the request, and forwards it to a PHP app 3. The PHP app loads the Solarium library and forwards the request to Solr/Jetty 4. Solr replies a JSON and Solarium turns it into a PHP Solarium Response Object 5. The PHP app sends the user the raw JSON through NGINX (as if it were Jetty) 6. The PHP app stores the query, the QTime and the numfound in a database I think I'll soon get into performance issues, as you guess. Do you know a better approach ? Thanks, Ben -- View this message in context: http://lucene.472066.n3.nabble.com/Log-numfound-qtime-tp4189561.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Solr logs encoding
I have seen this log earlier, I just changed the log level of this class to WARN. On Feb 27, 2015 12:03 AM, Moshe Recanati mos...@kmslh.com wrote: Hi, I've wired situation. Starting yesterday restart I've issue with log encoding. My log looks like: DEBUG - 2015-02-27 10:47:01.432; [0x4][0xfc][0xff][0xff][0xff][0xf][0x4][0xc7]8[0x4][0xfc][0xff][0xff][0xff][0xf][0x4][0x89][0x5][0x4][0xfc][0xff][0xff][0xff][0xf][0x4][0x97][0x4][0x4][0xfc][0xff][0xff][0xff][0xf][0x4][0xa4][0x6][0x4][0xfc][0xff][0xff][0xff][0xf][0x4][0xfc]b[0x4][0xfc][0xff][0xff][0xff][0xf][0x4][0xfc]F[0x4][0xfc][0xff][0xff][0xff][0xf][0x4][0xfb]:[0x4][0xfc][0xff][0xff][0xff][0xf][0x4]a[0x4][0xfc][0xff][0xff][0xff][0xf][0x4]v[0x4][0xfc][0xff][0xff][0xff][0xf][0x4]Y[0x4][0xfc][0xff][0xff][0xff][0xf][0x4]Y[0x4][0xfc][0xff][0xff][0xff][0xf][0x4]V[0x4][0xfc][0xff][0xff][0xff][0xf][0x4]H[0x4][0xfc][0xff][0xff][0xff][0xf][0x4]U[0x4][0xfc][0xff][0xff][0xff][0xf][0x4]\[0x4][0xfc][0xff][0xff][0xff][0xf][0x4][0xe4][0x96][0x1][0x4][0xfc][0xff][0xff][0xff][0xf][0x4]`[0x4][0xfc][0xff][0xff][0xff][0xf][0x4]j[0x4][0xfc][0xff][0xff][0xff][0xf][0x4]l[0x4][0xfc][0xff][0xff][0xff][0xf][0x4]j[0x4][0xfc][0xff][0xff][0xff][0xf][0x4]][0x4][0xfc][0xff][0xff][0xff][0xf][0x4]X[0x4][0xfc][0xff][0xff][0xff][0xf][0x4]e[0x4][0xfc][0xff][0xff][0xff][0xf][0x4][0xdd][0xba][0x1][0x4][0xfc][0xff][0xff][0xff][0xf][0x4]h[0x4][0xfc][0xff][0xff][0xff][0xf][0x4][0xb5][0x4][0xfc][0xff][0xff][0xff][0xf][0x4][0xee][0x3][0x4][0xfc][0xff][0xff][0xff][0xf][0x4]\[0x4][0xfc][0xff][0xff][0xff][0xf][0x4][0xe2][0x1d][0x4][0xfc][0xff][0xff][0xff][0xf][0x4][0xbb][0x1a][0x4][0xfc][0xff][0xff][0xff][0xf][0x4]c[0x4][0xfc][0xff][0xff][0xff][0xf][0x4][0xd2]%[0x4][0xfc][0xff][0xff][0xff][0xf][0x4]b[0x4][0xfc][0xff][0xff][0xff][0xf][0x4][0x92][0x1a][0x4][0xfc][0xff][0xff][0xff][0xf][0x4][0xa3][0x4][0x4][0xfc][0xff][0xff][0xff] Anyone familiar with this? How to fix it? *Regards,* *Moshe Recanati* *SVP Engineering* Office + 972-73-2617564 Mobile + 972-52-6194481 Skype: recanati [image: KMS2] http://finance.yahoo.com/news/kms-lighthouse-named-gartner-cool-121000184.html More at: www.kmslh.com | LinkedIn http://www.linkedin.com/company/kms-lighthouse | FB https://www.facebook.com/pages/KMS-lighthouse/123774257810917