RE: Using the date field for searching
You can use filter query and form the date as follows when a user enters just the year or year and month: If just the year (1885) was entered - date:[1885-01-01T00:00:00Z TO 1886-01-01T00:00:00Z] If just the year and month (1885-06) were entered - date:[1885-06-01T00:00:00Z TO 1885-07-01T00:00:00Z] Alternatively use DateRangeField as described at the bottom in the following webpage: https://cwiki.apache.org/confluence/display/solr/Working+with+Dates :Sagar -Original Message- From: Scott Derrick [mailto:sc...@tnstaafl.net] Sent: Tuesday, August 11, 2015 3:02 PM To: solr-user@lucene.apache.org Subject: Using the date field for searching If I query date:1885 I get an error org.apache.solr.common.SolrException: Invalid Date String:'1885' If I query date:1885* I get no results. and yet there are numerous docs with a year of 1885 in the date string, like so arr name=datedate1885-02-08T00:00:00Z/date/arr if I query date:1885-02-08T00:00:00Z I get 9 results?? Do the users really have to specify a full xml compliant date string to use the date: field for searching? thanks, Scott
Re: Solr MLT with stream.body returns different results on each shard
: I have a fresh install of Solr 5.2.1 with about 3 million docs freshly : indexed (I can also reproduce this issue on 4.10.0). When I use the Solr : MorelikeThisHandler with content stream I'm getting different results per : shard. I haven't looked at the code recently but i'm 99% certain that the MLT handler in general doesn't work with distributed (ie: sharded) queries. (unlike the MLT component and the recently added MLT qparser) I suspect that in the specific case of stream.body, what you are seeing is that the interesting terms are being computed relative the local tf/idf stats for that shard, and then only local results from that shard are being returned. : I also looked at using a standard MLT query, but I need to be able to : stream in a fairly large block of text for comparison that is not in the : index (different type of document). A standard MLT query Until/unless the MLT parser supports arbitrary text (there's some mention of this in SOLR-7639 but i'm not sure what the status of that is) you might find that just POSTing all of your text as a regular query (q) using dismax or edismax is suitable for your needs -- that's essentially the equivilent of what MLTHandler does with a stream.body, except it tries to only focus on interesting terms based on tf/idf, but if your fields are all configured with stopword files anyway, then the results and performance may be similar. -Hoss http://www.lucidworks.com/
Re: Core mismatch in org.apache.solr.update.StreamingSolrClients Errors for ConcurrentUpdateSolrClient
okay, to make everything clear, here are the steps: - Creating configs etc and then running: ./zkcli.sh -cmd upconfig -n CoreA -d /path/to/core/configs/CoreA/conf/ -z zk1:2181,zk2:2182,zk3:2183 - Then going to http://someserver:8983/solr/#/~cores - Clicking Add Core: http://lucene.472066.n3.nabble.com/file/n4222345/Screen_Shot_2015-08-11_at_14.png Repateding the last step on other node as well So this is invalid (incl https://wiki.apache.org/solr/CoreAdmin)? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Core-mismatch-in-org-apache-solr-update-StreamingSolrClients-Errors-for-ConcurrentUpdateSolrClient-tp4222335p4222345.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr old log files are not archived or removed automatically.
Hi Erick, 1 how did you install/run your Solr? As a service or regular? See the reference guide, Permanent Logging Settings for some info on the difference there. What is the difference between regular or service? 2 what does your log4j.properties file look like? Here are the contents in the log4j.properties file: # Logging level solr.log=logs log4j.rootLogger=INFO, file, CONSOLE log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout log4j.appender.CONSOLE.layout.ConversionPattern=%-4r [%t] %-5p %c %x [%X{collection} %X{shard} %X{replica} %X{core}] \u2013 %m%n #- size rotation with log cleanup. log4j.appender.file=org.apache.log4j.RollingFileAppender log4j.appender.file.MaxFileSize=4MB log4j.appender.file.MaxBackupIndex=9 #- File to log to and log format #log4j.appender.file.File=${solr.log}/solr.log log4j.appender.file.File=C:/solr_logs/solr.log log4j.appender.file.layout=org.apache.log4j.PatternLayout log4j.appender.file.layout.ConversionPattern=%-5p - %d{-MM-dd HH:mm:ss.SSS}; [%X{collection} %X{shard} %X{replica} %X{core}] %C; %m\n log4j.logger.org.apache.zookeeper=WARN log4j.logger.org.apache.hadoop=WARN # set to INFO to enable infostream log messages log4j.logger.org.apache.solr.update.LoggingInfoStream=OFF I am not sure how best I can limit the size of the solr_logs directory. Does log4j come with a feature to remove old log files with a given retention period? Best regards, Adrian Liew -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, August 10, 2015 11:36 PM To: solr-user@lucene.apache.org Subject: Re: Solr old log files are not archived or removed automatically. 1 how did you install/run your Solr? As a service or regular? See the reference guide, Permanent Logging Settings for some info on the difference there. 2 what does your log4j.properties file look like? Best, Erick On Mon, Aug 10, 2015 at 12:13 AM, Adrian Liew adrian.l...@avanade.com wrote: Hi there, I am using Solr v.5.2.1 on my local machine. I realized that old log files are not removed in a timely manner by log4j. The logs which I am referring to are the log files that reside within solr_directory\server\logs. So far I have previous two months' worth of log files accumulated in the log directory. Consequently, this causes my directory grow to such large sizes. I will need to manually remove the old log files which is undesirable. Is this is a bug with Solr or a missing configuration that needs to be set? As far as I know, all Solr Logging configuration is done in the solr_directory\server\resources\log4j.properties Appreciate the soonest reply. Thanks.
Re: Make search faster in Solr
Okay davidphilip. On Mon, Aug 10, 2015 at 8:24 PM davidphilip cherian davidphilipcher...@gmail.com wrote: Hi Nitin, 32 shards for 16 million documents is too much. 2 shards should suffice considering your document sizes are moderate. Caches are to be monitored and tuned accordingly. You should study about caches a bit here https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig On Mon, Aug 10, 2015 at 4:34 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi, I have 32 shards and single replica of each shards having 4 nodes over Solr cloud. I have indexed 16 million documents. Without cache, total time taken to search a document is 0.2 second. And with cache is 0.04 second. I don't do anything of cache. Caches are set by default in solrconfig.xml. How to make faster search without cache? Or how to make more faster with cache while searching. Which cache is used for searching?
Re: Core mismatch in org.apache.solr.update.StreamingSolrClients Errors for ConcurrentUpdateSolrClient
thanks for the details Anshum :) I got one more question, could this kind of error logging might be also triggered by the amount of incoming requests? I can see these errors only on prod env, but testing env is totally fine, although the creation process is exactly the same - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Core-mismatch-in-org-apache-solr-update-StreamingSolrClients-Errors-for-ConcurrentUpdateSolrClient-tp4222335p4222348.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Concurrent Indexing and Searching in Solr.
Hi Erick, Thanks a lot for your help. I will go through MongoDB. On Mon, Aug 10, 2015 at 9:14 PM Erick Erickson erickerick...@gmail.com wrote: bq: I changed maxWarmingSearchers*2*/maxWarmingSearchers to maxWarmingSearchers*100*/maxWarmingSearchers. And apply simultaneous searching using 100 workers. Do not do this. This has nothing to do with the number of searcher threads. And with your update rate, especially if you continue to insist on adding commit=true to every update request, this will explode your memory requirements. To no good purpose whatsoever. bq: But MongoDB can handle concurrent searching and indexing faster. Because MongoDB is optimized for different kinds of operations. Solr is a ranking, free-text search engine. It's an apples-and-oranges comparison. If MongoDB meets your search needs, you should use it. Best, Erick On Sun, Aug 9, 2015 at 11:04 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi, I used solr 5.2.1 version. It is fast, I think. But again, I am stuck on concurrent searching and threading. I changed maxWarmingSearchers*2*/maxWarmingSearchers to maxWarmingSearchers*100*/maxWarmingSearchers. And apply simultaneous searching using 100 workers. It works fast but not upto the mark. It increases searching from 1.5 to 0.5 seconds. But If I run only single worker then searching time is 0.03 seconds, it is too fast but not possible with 100 workers simultaneously. As Shawn said - Making 100 concurrent indexing requests at the same time as 100 concurrent queries will overwhelm *any* single Solr server. I got your point. But MongoDB can handle concurrent searching and indexing faster. Then why not solr? Sorry for this.. On Mon, Aug 10, 2015 at 2:39 AM Shawn Heisey apa...@elyograg.org wrote: On 8/7/2015 1:15 PM, Nitin Solanki wrote: I wrote a python script for indexing and using urllib and urllib2 for indexing data via http.. There are a number of Solr python clients. Using a client makes your code much easier to write and understand. https://wiki.apache.org/solr/SolPython I have no experience with any of these clients, but I can say that the one encountered most often when Python developers come into the #solr IRC channel is pysolr. Our wiki page says the last update for pysolr happened in December of 2013, but I can see that the last version on their web page is dated 2015-05-26. Making 100 concurrent indexing requests at the same time as 100 concurrent queries will overwhelm *any* single Solr server. In a previous message you said that you have 4 CPU cores. The load you're trying to put on Solr will require at *LEAST* 200 threads. It may be more than that. Any single system is going to have trouble with that. A system with 4 cores will be *very* overloaded. Thanks, Shawn
Re: Core mismatch in org.apache.solr.update.StreamingSolrClients Errors for ConcurrentUpdateSolrClient
How did you create your collections? Also, is that verbatim from the logs or is it just because you obfuscated that part while posting it here? On Mon, Aug 10, 2015 at 11:02 PM, deniz denizdurmu...@gmail.com wrote: Hello Anshum, thanks for the quick reply I know it is being forwarded one node to the leader node, but for collection names, it shows different collections while master node address is correct. Dunno if I am missing some points but my concern is the bold parts below: ERROR - 2015-08-11 05:04:34.592; [*CoreA* shard1 core_node2 *CoreA*] org.apache.solr.update.StreamingSolrClients$1; error org.apache.solr.common.SolrException: Bad Request request: http://server:8983/solr/*CoreB*/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2Fserver2%3A8983%2Fsolr%2F*CoreB*%2Fwt=javabinversion=2 So this is also normal? Anshum Gupta wrote Hi Deniz, Seems like the update that's being forwarded from a non-leader (original node that received the request) is failing. This could be due to multiple reasons, including issue with your schema vs document that you sent. To elaborate more, here's how a typical batched request in SolrCloud works. 1. Batch sent from client. 2. Received by node X. 3. All documents that have their shard leader on node X, are processed and distributed to the replicas by node X. All other documents which belong to a shard who's leader isn't on Node X, get forwarded using the ConcurrentUpdateSolrClient to their respective leaders. There's nothing *strange* about this log, other than the fact that the update failed (and would have failed even if you would have directly sent the document to this node). Hope this made things clear. -- Anshum Gupta - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Core-mismatch-in-org-apache-solr-update-StreamingSolrClients-Errors-for-ConcurrentUpdateSolrClient-tp4222335p4222338.html Sent from the Solr - User mailing list archive at Nabble.com. -- Anshum Gupta
RE: SolrNet and deep pagination
Thanks Chris. We opted to use v0.5 which is an alpha version. And yes you I should be referring the SolrNet Google Group. Thanks for your help. Regards, Adrian -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Tuesday, August 11, 2015 5:17 AM To: solr-user@lucene.apache.org Cc: Chong Kah Heng chong.kah.h...@avanade.com Subject: Re: SolrNet and deep pagination : Has anyone worked with deep pagination using SolrNet? The SolrNet : version that I am using is v0.4.0.2002. I followed up with this article, : https://github.com/mausch/SolrNet/blob/master/Documentation/CursorMark.md : , however the version of SolrNet.dll does not expose the a StartOrCursor : property in the QueryOptions class. I don't know anything about SolrNet, but i do know that the URL you list above is for the documentation on the master branch. If i try to look at the the same document on the 0.4.x branch, that document doesn't exist -- suggesting the feature isn't supported in the version of SolrNet you are using... https://github.com/mausch/SolrNet/blob/0.4.x/Documentation/CursorMark.md https://github.com/mausch/SolrNet/tree/0.4.x/Documentation In fact, if i search the repo for StartOrCursor i see a file named StartOrCursor.cs exists on the master branch, but not on the 0.4.x branch... https://github.com/mausch/SolrNet/blob/master/SolrNet/StartOrCursor.cs https://github.com/mausch/SolrNet/blob/0.4.x/SolrNet/StartOrCursor.cs ...so it seems unlikely that this (class?) is supported in the release you are using. Note: according to the docs, there is a SolrNet google group where this question is probably the most appopriate: https://github.com/mausch/SolrNet/blob/master/Documentation/README.md https://groups.google.com/forum/#!forum/solrnet -Hoss http://www.lucidworks.com/
Re: Cluster down for long time after zookeeper disconnection
1. Erik, thanks, I agree that it is really serious, but I think that the 3 minutes on this case were not mandatory. On my case it was a deadlock, which smells like some kind of bug. One replica is waiting for other to come up, before it takes leadership, while the other is waiting for the election results. If I will be able to reproduce it on 5.2.1, is it legitimate to file a JIRA issue for that? 2. Regarding session timeouts, there's something about configuration that I don't understand. If zkClientTimeout is set to 30 seconds, how come see in the log that session expired after ~50 seconds. Maybe I have a mismatch between zookeeper and solr configuration? 3. Resuming the question of leaderVoteWait parameter, I have seen in a few threads that it may be reduced to a minimum. I'm not clear about the full meaning, but I understand that it is meant to prevent lose of update on cluster startup. Can anyone confirm/clarify that? Links for leaderVoteWait: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201307.mbox/%3ccajt9wnhivirpn79kttcn8ekafevhhmqwkfl-+i16kbz0ogl...@mail.gmail.com%3E http://qnalist.com/questions/4812859/waitforleadertoseedownstate-when-leader-is-down Relevant part from My zookeeper conf: tickTime=2000 initLimit=10 syncLimit=5 On Tue, Aug 11, 2015 at 1:06 AM, Erick Erickson erickerick...@gmail.com wrote: Not that I know of. With ZK as the one source of truth, dropping below quorum is Really Serious, so having to wait 3 minutes or so for action to be taken is the fallback. Best, Erick On Mon, Aug 10, 2015 at 1:34 PM, danny teichthal dannyt...@gmail.com wrote: Erick, I assume you are referring to zkClientTimeout, it is set to 30 seconds. I also see these messages on Solr side: Client session timed out, have not heard from server in 48865ms for sessionid 0x44efbb91b5f0001, closing socket connection and attempting reconnect. So, I'm not sure what was the actual disconnection duration time, but it could have been up to a minute. We are working on finding the network issues root cause, but assuming disconnections will always occur, are there any other options to overcome this issues? On Mon, Aug 10, 2015 at 11:18 PM, Erick Erickson erickerick...@gmail.com wrote: I didn't see the zk timeout you set (just skimmed). But if your Zookeeper was down _very_ termporarily, it may suffice to up the ZK timeout. The default in the 10.4 time-frame (if I remember correctly) was 15 seconds which has proven to be too short in many circumstances. Of course if your ZK was down for minutest this wouldn't help. Best, Erick On Mon, Aug 10, 2015 at 1:06 PM, danny teichthal dannyt...@gmail.com wrote: Hi Alexander , Thanks for your reply, I looked at the release notes. There is one bug fix - SOLR-7503 https://issues.apache.org/jira/browse/SOLR-7503 – register cores asynchronously. It may reduce the registration time since it is done on parallel, but still, 3 minutes (leaderVoteWait) is a long time to recover from a few seconds of disconnection. Except from that one I don't see any bug fix that addresses the same problem. I am able to reproduce it on 4.10.4 pretty easily, I will also try it with 5.2.1 and see if it reproduces. Anyway, since migrating to 5.2.1 is not an option for me in the short term, I'm left with the question if reducing leaderVoteWait may help here, and what may be the consequences. If i understand correctly, there might be a chance of losing updates that were made on leader. From my side it is a lot worse to lose availability for 3 minutes. I would really appreciate a feedback on this. On Mon, Aug 10, 2015 at 6:55 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Did you look at release notes for Solr versions after your own? I am pretty sure some similar things were identified and/or resolved for 5.x. It may not help if you cannot migrate, but would at least give a confirmation and maybe workaround on what you are facing. Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 10 August 2015 at 11:37, danny teichthal dannyt...@gmail.com wrote: Hi, We are using Solr cloud with solr 4.10.4. On the passed week we encountered a problem where all of our servers disconnected from zookeeper cluster. This might be ok, the problem is that after reconnecting to zookeeper it looks like for every collection both replicas do not have a leader and are stuck in some kind of a deadlock for a few minutes. From what we understand: One of the replicas assume it ill be the leader and at some point starting to wait on leaderVoteWait, which is by default 3 minutes. The other replica is stuck on this part of code for a few minutes: at
Re: Core mismatch in org.apache.solr.update.StreamingSolrClients Errors for ConcurrentUpdateSolrClient
Hello Anshum, thanks for the quick reply I know it is being forwarded one node to the leader node, but for collection names, it shows different collections while master node address is correct. Dunno if I am missing some points but my concern is the bold parts below: ERROR - 2015-08-11 05:04:34.592; [*CoreA* shard1 core_node2 *CoreA*] org.apache.solr.update.StreamingSolrClients$1; error org.apache.solr.common.SolrException: Bad Request request: http://server:8983/solr/*CoreB*/update?update.distrib=TOLEADERdistrib.from=http%3A%2F%2Fserver2%3A8983%2Fsolr%2F*CoreB*%2Fwt=javabinversion=2 So this is also normal? Anshum Gupta wrote Hi Deniz, Seems like the update that's being forwarded from a non-leader (original node that received the request) is failing. This could be due to multiple reasons, including issue with your schema vs document that you sent. To elaborate more, here's how a typical batched request in SolrCloud works. 1. Batch sent from client. 2. Received by node X. 3. All documents that have their shard leader on node X, are processed and distributed to the replicas by node X. All other documents which belong to a shard who's leader isn't on Node X, get forwarded using the ConcurrentUpdateSolrClient to their respective leaders. There's nothing *strange* about this log, other than the fact that the update failed (and would have failed even if you would have directly sent the document to this node). Hope this made things clear. -- Anshum Gupta - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Core-mismatch-in-org-apache-solr-update-StreamingSolrClients-Errors-for-ConcurrentUpdateSolrClient-tp4222335p4222338.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Core mismatch in org.apache.solr.update.StreamingSolrClients Errors for ConcurrentUpdateSolrClient
bq. adding it on admin interface of solr Did you not use Collections Admin API? If you try to create your own cores using the core admin APIs instead of using Collection Admin APIs, you could really end up shooting yourself in your feet. Also, the only supported mechanism to create a collection in Solr is via the Collection APIs. On Mon, Aug 10, 2015 at 11:13 PM, deniz denizdurmu...@gmail.com wrote: I have created by simply creating configs and then using upconfig to upload to zookeeper, then adding it on admin interface of solr. I have only changed the ips of server and server1 and changed the core/collection names to CoreA and CoreB, in the logs CoreA and CoreB are different collections with different names. - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Core-mismatch-in-org-apache-solr-update-StreamingSolrClients-Errors-for-ConcurrentUpdateSolrClient-tp4222335p4222341.html Sent from the Solr - User mailing list archive at Nabble.com. -- Anshum Gupta
Performance warning overlapping onDeckSearchers
Hi there, Has anyone come across this issue, [some_index] PERFORMANCE WARNING: Overlapping onDeckSearchers=2? I am currently using Solr v5.2.1. What does this mean? Does this raise red flags? I am currently encountering an issue whereby my Sitecore system is unable to update the index appropriately. I am not sure if this is linked to the warnings above. Regards, Adrian
Re: Core mismatch in org.apache.solr.update.StreamingSolrClients Errors for ConcurrentUpdateSolrClient
I have created by simply creating configs and then using upconfig to upload to zookeeper, then adding it on admin interface of solr. I have only changed the ips of server and server1 and changed the core/collection names to CoreA and CoreB, in the logs CoreA and CoreB are different collections with different names. - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Core-mismatch-in-org-apache-solr-update-StreamingSolrClients-Errors-for-ConcurrentUpdateSolrClient-tp4222335p4222341.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Make search faster in Solr
Hi davidphilip, Without caching, Can we do fast searching? On Tue, Aug 11, 2015 at 11:43 AM Nitin Solanki nitinml...@gmail.com wrote: Okay davidphilip. On Mon, Aug 10, 2015 at 8:24 PM davidphilip cherian davidphilipcher...@gmail.com wrote: Hi Nitin, 32 shards for 16 million documents is too much. 2 shards should suffice considering your document sizes are moderate. Caches are to be monitored and tuned accordingly. You should study about caches a bit here https://cwiki.apache.org/confluence/display/solr/Query+Settings+in+SolrConfig On Mon, Aug 10, 2015 at 4:34 PM, Nitin Solanki nitinml...@gmail.com wrote: Hi, I have 32 shards and single replica of each shards having 4 nodes over Solr cloud. I have indexed 16 million documents. Without cache, total time taken to search a document is 0.2 second. And with cache is 0.04 second. I don't do anything of cache. Caches are set by default in solrconfig.xml. How to make faster search without cache? Or how to make more faster with cache while searching. Which cache is used for searching?
Re: Core mismatch in org.apache.solr.update.StreamingSolrClients Errors for ConcurrentUpdateSolrClient
It's not entirely invalid but the only supported mechanism to create collections is via the Collections admin API: https://cwiki.apache.org/confluence/display/solr/Collections+API On Mon, Aug 10, 2015 at 11:53 PM, deniz denizdurmu...@gmail.com wrote: okay, to make everything clear, here are the steps: - Creating configs etc and then running: ./zkcli.sh -cmd upconfig -n CoreA -d /path/to/core/configs/CoreA/conf/ -z zk1:2181,zk2:2182,zk3:2183 - Then going to http://someserver:8983/solr/#/~cores - Clicking Add Core: http://lucene.472066.n3.nabble.com/file/n4222345/Screen_Shot_2015-08-11_at_14.png Repateding the last step on other node as well So this is invalid (incl https://wiki.apache.org/solr/CoreAdmin)? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Core-mismatch-in-org-apache-solr-update-StreamingSolrClients-Errors-for-ConcurrentUpdateSolrClient-tp4222335p4222345.html Sent from the Solr - User mailing list archive at Nabble.com. -- Anshum Gupta
Deadlock-like behavior when new IndexWriter created
Hi, I have solr5.2.1 set up in master-slave configuration. Very often it happens that solr slave starts replicating (I can see it in admin panel) but it is getting stuck at 0% and never proceeds further. Usually restart of slave helps. Relevant logs from slave: INFO - 2015-08-11 07:56:00.184; org.apache.solr.handler.IndexFetcher; Master's generation: 26 INFO - 2015-08-11 07:56:00.188; org.apache.solr.handler.IndexFetcher; Slave's generation: 25 INFO - 2015-08-11 07:56:00.189; org.apache.solr.handler.IndexFetcher; Starting replication process INFO - 2015-08-11 07:56:00.205; org.apache.solr.handler.IndexFetcher; Number of files in latest index in master: 10 INFO - 2015-08-11 07:56:00.209; org.apache.solr.core.CachingDirectoryFactory; return new directory for /var/solr/data/catalog_article_1_de_DE/data/index.20150811075600209 *INFO - 2015-08-11 07:56:00.212; org.apache.solr.update.DefaultSolrCoreState; Creating new IndexWriter...* *INFO - 2015-08-11 07:56:00.221; org.apache.solr.update.DefaultSolrCoreState; Waiting until IndexWriter is unused... core=catalog_article_1_de_DE* INFO - 2015-08-11 07:56:00.522; org.apache.solr.core.SolrCore; [catalog_article_1_de_DE] webapp=/solr path=/select params={} hits=0 status=0 QTime=1 INFO - 2015-08-11 07:56:03.654; org.apache.solr.core.SolrCore; [catalog_article_1_de_DE] webapp=/solr path=/select params={} hits=0 status=0 QTime=1 here is relevant solrconfig.xml entries: updateHandler class=solr.DirectUpdateHandler2 updateLog str name=dir${solr.catalog_article_1_de_DE.data.dir:}/str /updateLog autoCommit maxDocs1/maxDocs maxTime30/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime15000/maxTime /autoSoftCommit /updateHandler requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=confFilesschema.xml,solrconfig.xml/str /lst lst name=slave str name=enable${enable.slave:false}/str str name=masterUrl${master.url:127.0.0.1:8983}/${solr.core.name }/str str name=pollInterval00:01:00/str /lst /requestHandler Has anybody faced the same problem? Is it master's or slave's issue? How can I debug/fix the problem? Thanks Andrii
Re: Count of distinct values in faceting.
Please read docVlaues as docValues in my mail above. Regards, Modassar On Tue, Aug 11, 2015 at 4:01 PM, Modassar Ather modather1...@gmail.com wrote: Hi, Count of distinct values can be retrieved by following ways. Please note that the Solr version is 5.2.1. 1. Using cardinality=true. 2. Using hll() facet function. Kindly help me understand: 1. How accurate are them comparatively and better performance wise with millions of documents? 2. Per my understanding the {!cardinality=1.0} returns the most accurate result. Is my understanding correct and if yes is it 100% accurate? 3. How accurate result is returned by hll() function? 4. I am getting following exception for the query : q=field:querystats=truestats.field={!cardinality=1.0}field. The exception is not seen once the cardinality is set to 0.9 or less. The field is docVlaues enabled and indexed=false. The same exception I tried to reproduce on non docVlaues field but could not. Please help me resolve the issue. ERROR - 2015-08-11 12:24:00.222; [core] org.apache.solr.common.SolrException; null:java.lang.ArrayIndexOutOfBoundsException: 3 at net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152) at net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247) at net.agkn.hll.HLL.toBytes(HLL.java:917) at net.agkn.hll.HLL.toBytes(HLL.java:869) at org.apache.solr.handler.component.AbstractStatsValues.getStatsValues(StatsValuesFactory.java:348) at org.apache.solr.handler.component.StatsComponent.convertToResponse(StatsComponent.java:151) at org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:62) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:497) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745) Thanks, Modassar
Count of distinct values in faceting.
Hi, Count of distinct values can be retrieved by following ways. Please note that the Solr version is 5.2.1. 1. Using cardinality=true. 2. Using hll() facet function. Kindly help me understand: 1. How accurate are them comparatively and better performance wise with millions of documents? 2. Per my understanding the {!cardinality=1.0} returns the most accurate result. Is my understanding correct and if yes is it 100% accurate? 3. How accurate result is returned by hll() function? 4. I am getting following exception for the query : q=field:querystats=truestats.field={!cardinality=1.0}field. The exception is not seen once the cardinality is set to 0.9 or less. The field is docVlaues enabled and indexed=false. The same exception I tried to reproduce on non docVlaues field but could not. Please help me resolve the issue. ERROR - 2015-08-11 12:24:00.222; [core] org.apache.solr.common.SolrException; null:java.lang.ArrayIndexOutOfBoundsException: 3 at net.agkn.hll.serialization.BigEndianAscendingWordSerializer.writeWord(BigEndianAscendingWordSerializer.java:152) at net.agkn.hll.util.BitVector.getRegisterContents(BitVector.java:247) at net.agkn.hll.HLL.toBytes(HLL.java:917) at net.agkn.hll.HLL.toBytes(HLL.java:869) at org.apache.solr.handler.component.AbstractStatsValues.getStatsValues(StatsValuesFactory.java:348) at org.apache.solr.handler.component.StatsComponent.convertToResponse(StatsComponent.java:151) at org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:62) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:255) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2064) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:450) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:497) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745) Thanks, Modassar
Re: (possible)SimplePostTool problem --(Windows, Bitnami distribution)
Hi there! I encountered the same problem as you did Have you found the answer yet? Would be really thankful if you could share ur experience! Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/possible-SimplePostTool-problem-Windows-Bitnami-distribution-tp4199980p4222382.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR Physical Memory leading to OOM
On 8/10/2015 7:07 PM, rohit wrote: Thanks Shawn. I was looking at SOLR Admin UI and also using top command on server. The amount of free memory shown by tools like that is not a very good way to determine what's happening with your memory. As I said before, it's completely normal for the OS to utilize almost all of your physical memory, even if your programs only require a fraction of it. It is not a meaningful metric for success. Im running a endurance for 4 hr with 50/sec TPS and i see the physical memory keeps on increasing during the time and if we have schedule delta import during that time frame which can import upto 4 million docs. After the import I see again the memory increases and their comes a point when their is no more memory left which in turn leads to OOM. If you're hitting OOM, then you need to increase the heap size. Solr is requiring more memory than you have assigned to the heap. This will reduce the amount of memory available for the OS disk cache, which may reduce performance. There may be ways you can reduce heap usage by adjusting your configuration or the way you use Solr. http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap There is other good information on that page. I would encourage you to go to the top of the page and read all of it. 50 queries per second is a lot of load for a single server. I would only expect to see success with that many queries per second if the index fits entirely into the OS disk cache. It may also be necessary for the index to be relatively small. I have seen one more thing if their is no activity on server no import , no search going. I have not seen the memory coming down from the state which was created after test. In general, once Java grabs memory, it doesn't let it go. As I previously mentioned, it cannot grab more than you ask, plus some overhead. The overhead may be a few hundred mb, which is not very much when you're talking about multiple gigabytes. Couple of things to notice: 1. We are storing data and indexing also. (not sure if that is causing problem). 2. Is 8 GB enuf for 10 million or more data to index. 3. We have custom handler which extend solr handlers to return data when client calls solr handler. I couldn't tell you whether 8GB is enough for 10 million documents. That depends on what's in those documents, what your schema.xml says, how you query Solr, and a few other factors. Even if you tell me the answers to these questions, I *still* may not be able to say whether it's enough. I *might* be able to tell you that it's NOT enough, though. The only way to be absolutely sure is to prototype -- actually try it out. https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ 8GB of RAM is a *very* small system in the world of Solr. My systems have 64GB of RAM, and I frequently wish that was 256GB. My indexes are somewhat larger than yours, though. Thanks, Shawn
Re: Solr old log files are not archived or removed automatically.
On 8/11/2015 3:10 AM, Adrian Liew wrote: Hi Erick, 1 how did you install/run your Solr? As a service or regular? See the reference guide, Permanent Logging Settings for some info on the difference there. What is the difference between regular or service? On certain operating systems, you can use a shell script that comes with Solr 5.x to install Solr as a service, with an init script to start it on boot. Regular would mean manual start using the bin/solr script. 2 what does your log4j.properties file look like? Here are the contents in the log4j.properties file: # Logging level solr.log=logs log4j.rootLogger=INFO, file, CONSOLE log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout log4j.appender.CONSOLE.layout.ConversionPattern=%-4r [%t] %-5p %c %x [%X{collection} %X{shard} %X{replica} %X{core}] \u2013 %m%n #- size rotation with log cleanup. log4j.appender.file=org.apache.log4j.RollingFileAppender log4j.appender.file.MaxFileSize=4MB log4j.appender.file.MaxBackupIndex=9 #- File to log to and log format #log4j.appender.file.File=${solr.log}/solr.log log4j.appender.file.File=C:/solr_logs/solr.log log4j.appender.file.layout=org.apache.log4j.PatternLayout log4j.appender.file.layout.ConversionPattern=%-5p - %d{-MM-dd HH:mm:ss.SSS}; [%X{collection} %X{shard} %X{replica} %X{core}] %C; %m\n log4j.logger.org.apache.zookeeper=WARN log4j.logger.org.apache.hadoop=WARN # set to INFO to enable infostream log messages log4j.logger.org.apache.solr.update.LoggingInfoStream=OFF I am not sure how best I can limit the size of the solr_logs directory. Does log4j come with a feature to remove old log files with a given retention period? The section of the log4j.properties file entitled size rotation with log cleanup describes the built-in rotation for the solr log. It will keep nine backup logfiles, and each one will be limited in size to 4MB. That means that the maximum size of the logs for *solr* is about 40MB. If you aren't seeing this behavior, then there are a few possible problems. Your properties file may have a bug in it. It looks correct to me, but I haven't tried to actually validate it. It might not be Solr (log4j) that's making the problem logfiles. It could be Jetty, or something else entirely. Your Java VM might not be using the properties file that you included here. Thanks, Shawn
Re: Performance warning overlapping onDeckSearchers
On 8/11/2015 3:02 AM, Adrian Liew wrote: Has anyone come across this issue, [some_index] PERFORMANCE WARNING: Overlapping onDeckSearchers=2? I am currently using Solr v5.2.1. What does this mean? Does this raise red flags? I am currently encountering an issue whereby my Sitecore system is unable to update the index appropriately. I am not sure if this is linked to the warnings above. https://wiki.apache.org/solr/FAQ#What_does_.22PERFORMANCE_WARNING:_Overlapping_onDeckSearchers.3DX.22_mean_in_my_logs.3F What the wiki page doesn't explicitly state is that increasing maxWarmingSearchers is usually the wrong way to solve this, because that can actually make the problem *worse*. It is implied by the things the page DOES say, but it is not stated. Thanks, Shawn
Re: Deadlock-like behavior when new IndexWriter created
Hmm, it would be _really_ helpful if next time it happens you could get a stack trace (see jstack, should have come with your Java). As it happens we're chasing another deadlock and it'd be interesting to see if they're related. Thanks! Erick On Tue, Aug 11, 2015 at 1:12 AM, Andrii Berezhynskyi andrii.berezhyns...@home24.de wrote: Hi, I have solr5.2.1 set up in master-slave configuration. Very often it happens that solr slave starts replicating (I can see it in admin panel) but it is getting stuck at 0% and never proceeds further. Usually restart of slave helps. Relevant logs from slave: INFO - 2015-08-11 07:56:00.184; org.apache.solr.handler.IndexFetcher; Master's generation: 26 INFO - 2015-08-11 07:56:00.188; org.apache.solr.handler.IndexFetcher; Slave's generation: 25 INFO - 2015-08-11 07:56:00.189; org.apache.solr.handler.IndexFetcher; Starting replication process INFO - 2015-08-11 07:56:00.205; org.apache.solr.handler.IndexFetcher; Number of files in latest index in master: 10 INFO - 2015-08-11 07:56:00.209; org.apache.solr.core.CachingDirectoryFactory; return new directory for /var/solr/data/catalog_article_1_de_DE/data/index.20150811075600209 *INFO - 2015-08-11 07:56:00.212; org.apache.solr.update.DefaultSolrCoreState; Creating new IndexWriter...* *INFO - 2015-08-11 07:56:00.221; org.apache.solr.update.DefaultSolrCoreState; Waiting until IndexWriter is unused... core=catalog_article_1_de_DE* INFO - 2015-08-11 07:56:00.522; org.apache.solr.core.SolrCore; [catalog_article_1_de_DE] webapp=/solr path=/select params={} hits=0 status=0 QTime=1 INFO - 2015-08-11 07:56:03.654; org.apache.solr.core.SolrCore; [catalog_article_1_de_DE] webapp=/solr path=/select params={} hits=0 status=0 QTime=1 here is relevant solrconfig.xml entries: updateHandler class=solr.DirectUpdateHandler2 updateLog str name=dir${solr.catalog_article_1_de_DE.data.dir:}/str /updateLog autoCommit maxDocs1/maxDocs maxTime30/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime15000/maxTime /autoSoftCommit /updateHandler requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAfteroptimize/str str name=confFilesschema.xml,solrconfig.xml/str /lst lst name=slave str name=enable${enable.slave:false}/str str name=masterUrl${master.url:127.0.0.1:8983}/${solr.core.name }/str str name=pollInterval00:01:00/str /lst /requestHandler Has anybody faced the same problem? Is it master's or slave's issue? How can I debug/fix the problem? Thanks Andrii
Solr MLT with stream.body returns different results on each shard
I have a fresh install of Solr 5.2.1 with about 3 million docs freshly indexed (I can also reproduce this issue on 4.10.0). When I use the Solr MorelikeThisHandler with content stream I'm getting different results per shard. I also looked at using a standard MLT query, but I need to be able to stream in a fairly large block of text for comparison that is not in the index (different type of document). A standard MLT query http://testsolr2:8983/solr/mega/select?q=electronicsmlt.flt=textmlt.mintf=0fl=id,score appears to return consistent results between shards. Any reason why the content stream query would be different between shards? Thank you for your help! Aaron *Content Stream Example:* http://testsolr1:8983/solr/mega/mlt?stream.body=electronicsmlt.flt=textmlt.mintf=0fl=id,score *Returns: * response lst name=responseHeader int name=status0/int int name=QTime3/int /lst result name=response numFound=1590 start=0 http://testsolr2:8983/solr/mega/mlt?stream.body=electronicsmlt.flt=textmlt.mintf=0fl=id,score *Returns: * response lst name=responseHeader int name=status0/int int name=QTime1/int /lst result name=response numFound=1619 start=0
Using the date field for searching
If I query date:1885 I get an error org.apache.solr.common.SolrException: Invalid Date String:'1885' If I query date:1885* I get no results. and yet there are numerous docs with a year of 1885 in the date string, like so arr name=datedate1885-02-08T00:00:00Z/date/arr if I query date:1885-02-08T00:00:00Z I get 9 results?? Do the users really have to specify a full xml compliant date string to use the date: field for searching? thanks, Scott
Re: SOLR Physical Memory leading to OOM
Thanks Shawn a lot !!. Just wanted to clarify we have solrCloud so when doing my testing its not a single server where im hitting. I have multiple servers. At a time we have 4 leaders n 4 replicas which are communicated using zookeeper. So, in total we have 8 servers and zookeeper is install on 5 of them. As per your other article , we are planning to move from 1.7 java either to 1.7.0_72-b14 or java 8 -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-Physical-Memory-leading-to-OOM-tp499p4222434.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: (possible)SimplePostTool problem --(Windows, Bitnami distribution)
What was the actual command-line used for the failing attempts? Try using -Dauto=yes (java -Dauto=yes -Dc=tika -jar post.jar ….) Check out “post.jar -h” for more details on command-line options. — Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com http://www.lucidworks.com/ On Apr 15, 2015, at 3:23 PM, kenadian adr...@r-2.ca wrote: Hello all, my Bitnami/*Solr-5.0.0* instalation is not able to index any type of file(found in the provided examples folders or anywhere else) except HTML. Tested on the files in exampledocs folder (books.csv,books.json,...,utf8-example.xml, vidcard.xml) I get: for *.csv* files I get the reponse Unexpected character 'i' (depending on what is the 1st character in file), for *.xml* files I get the response ERROR: unknown field 'id' for *.pdf* files I get the response Invalid UTF-8 middle byte 0xe5 and so forth. Even *.TXT* files are not handled: I get the reponse Unexpected character 'T' (depending on what is the 1st character in file--This is a test of TXT extraction in Solr, it is only a test. Do not panic.) The only type that works is *HTML* : C:\Bitnami\solr-5.0.0-0\apache-solr\solr\exampledocsjava -Dc=tika -jar post.jar *.html SimplePostTool version 5.0.0 Posting files to [base] url http://localhost:8983/solr/tika/update using content-type application/xml... POSTing file sample.html to [base] 1 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/tika/update... Time spent: 0:00:00.313 I use Windows 8.1, java version 1.8.0_40. Any ideas of how to fix this? Many thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/possible-SimplePostTool-problem-Windows-Bitnami-distribution-tp4199980.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Choosing the order of the fields to be displayed at output
On 8/11/2015 9:36 PM, Zheng Lin Edwin Yeo wrote: I'm using Solr 5.2.1. I understand that for JSON format, Solr writes out the fields of each document in the order they are found in the index as it is the fastest and most efficient for Solr to return the data. However, this causes confusion as each of the records has fields arranged in different order, as user are allowed to update the field after the document is index. Whenever a field is updated, that field will be displayed at the bottom of the record. Is there a way to choose the order of the fields to be displayed at the output, so that the order will be consistent for all the records? Solr simply returns the fields in the order that Java naturally stores the information, which from a user perspective, is not very predictable, and may change from one version of code to the next, or when Java is upgraded. I think that deciding information display order order is a job for client code. The application that makes the request to Solr can pick the pieces that need to be displayed to the user and decide what order they should be in. Thanks, Shawn
Re: Streaming API running a simple query
Hi All, I have written a blog to cover this nested merge expressions, see http://knackforge.com/blog/selvam/solr-streaming-expressions for more details. Thanks. On Mon, Aug 10, 2015 at 3:51 PM, Selvam s.selvams...@gmail.com wrote: Hi, Thanks, that seems to be working! On Sat, Aug 8, 2015 at 9:28 PM, Joel Bernstein joels...@gmail.com wrote: This sounds doable using nested merge functions like this: merge(search(...), merge(search(...), search(),...), ...) Joel Bernstein http://joelsolr.blogspot.com/ On Sat, Aug 8, 2015 at 8:08 AM, Selvam s.selvams...@gmail.com wrote: Hi, I needed to run a multiple subqueries each with its own limit of rows. For eg: to get 30 users from country India with age greater than 30 and 50 users from England who are all male. Thanks again. On 08-Aug-2015 5:30 pm, Joel Bernstein joels...@gmail.com wrote: Can you describe your use case? Joel Bernstein http://joelsolr.blogspot.com/ On Sat, Aug 8, 2015 at 7:36 AM, Selvam s.selvams...@gmail.com wrote: Hi, Thanks, good to know, in fact my requirement needs to merge multiple expressions, while current streaming expressions supports only two expression. Do you think we can expect that in future versions? On 07-Aug-2015 6:46 pm, Joel Bernstein joels...@gmail.com wrote: Hi, There is a new error handling framework in trunk (SOLR-7441) for the Streaming API, Streaming Expressions. So if you're purely in testing mode, it will be much easier to work in trunk then Solr 5.2. If you run into errors in trunk that are still confusing please continue to report them so we can get all the error messages covered. Thanks, Joel Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Aug 7, 2015 at 6:19 AM, Selvam s.selvams...@gmail.com wrote: Hi, Sorry, it is working now. curl --data-urlencode 'stream=search(gettingstarted,q=*:*,fl=id,sort=id asc)' http://localhost:8983/solr/gettingstarted/stream I missed *'asc'* in sort :) Thanks for the help Shawn Heisey. On Fri, Aug 7, 2015 at 3:46 PM, Selvam s.selvams...@gmail.com wrote: Hi, Thanks for your update, yes, I was missing the cloud mode, I am new to the world of Solr cloud. Now I have enabled a single node (with two shards replicas) that runs on 8983 port along with zookeeper running on 9983 port. When I run, curl --data-urlencode 'stream=search(gettingstarted,q=*:*,fl=id,sort=id)' http://localhost:8983/solr/gettingstarted/stream Again, I get Unable to construct instance of org.apache.solr.client.solrj.io.stream.CloudSolrStream . . Caused by: java.lang.reflect.InvocationTargetException . . Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 I tried different port, 9983 as well, which returns Empty reply from server. I think I miss some obvious configuration. On Fri, Aug 7, 2015 at 2:04 PM, Shawn Heisey apa...@elyograg.org wrote: On 8/7/2015 1:37 AM, Selvam wrote: https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions I tried this from my linux terminal, 1) curl --data-urlencode 'stream=search(gettingstarted,q=*:*,fl=id,sort=id)' http://localhost:8983/solr/gettingstarted/stream Threw zkHost error. Then tried with, 2) curl --data-urlencode 'stream=search(gettingstarted,zkHost=localhost:8983,q=*:*,fl=id,sort=id)' http://localhost:8983/solr/gettingstarted/stream It throws me java.lang.ArrayIndexOutOfBoundsException: 1\n\tat org.apache.solr.client.solrj.io.stream.CloudSolrStream.parseComp(CloudSolrStream.java:260) The documentation page you linked seems to indicate that this is a feature that only works in SolrCloud. Your inclusion of localhost:8983 as the zkHost suggests that either you are NOT running in cloud mode, or that you do not understand what zkHost means. Zookeeper runs on a different port than Solr. 8983 is Solr's port. If you are running a 5.x cloud with the embedded zookeeper, it is most likely running on port 9983. If you are running in cloud mode with a properly configured external zookeeper, then your zkHost parameter will probably have three hosts in it with port 2181. Thanks, Shawn -- Regards, Selvam KnackForge
Re: Filter Out Facet Results
One solution is to filter these out at indexing time. The StopFilter with a custom stop list file could do the trick - you’ll probably need to adjust your field type definition to be a TextField instead of a StrField, use a KeywordTokenizer and then a StopFilter. — Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com http://www.lucidworks.com/ On Aug 10, 2015, at 6:28 PM, Paden rumsey...@gmail.com wrote: Hello, I'm trying to figure out how to filter out particular facets out of my results. I'm doing some Named Entity Extraction and putting them up as faceting information. However, not all the results I get are exact. For example, the string w 5th street will appear in the Person facet list. These entities are the same every time. I know what they will be so I can predictably say which ones will be wrong. I was wondering if there was a way to write into the solrconfig to filter out these bad entities. I know that filter query can be a great way to INCLUDE a facet in the search or narrow a search based on the facet. But I'm not quite sure how to filter results out Thanks in advance for any help you can provide. -- View this message in context: http://lucene.472066.n3.nabble.com/Filter-Out-Facet-Results-tp493.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using the date field for searching
Sagar, thanks, Scott Original Message Subject: Re: Using the date field for searching From: Bade, Vidya (Sagar) vb...@webmd.net To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: 08/11/2015 03:05 PM You can use filter query and form the date as follows when a user enters just the year or year and month: If just the year (1885) was entered - date:[1885-01-01T00:00:00Z TO 1886-01-01T00:00:00Z] If just the year and month (1885-06) were entered - date:[1885-06-01T00:00:00Z TO 1885-07-01T00:00:00Z] Alternatively use DateRangeField as described at the bottom in the following webpage: https://cwiki.apache.org/confluence/display/solr/Working+with+Dates :Sagar -Original Message- From: Scott Derrick [mailto:sc...@tnstaafl.net] Sent: Tuesday, August 11, 2015 3:02 PM To: solr-user@lucene.apache.org Subject: Using the date field for searching If I query date:1885 I get an error org.apache.solr.common.SolrException: Invalid Date String:'1885' If I query date:1885* I get no results. and yet there are numerous docs with a year of 1885 in the date string, like so arr name=datedate1885-02-08T00:00:00Z/date/arr if I query date:1885-02-08T00:00:00Z I get 9 results?? Do the users really have to specify a full xml compliant date string to use the date: field for searching? thanks, Scott -- Sin makes its own hell, and goodness its own heaven. Mary Baker Eddy
Re: Highlighting
Scott - doesn’t look you’ve specified hl.fl specifying which field(s) to highlight. p.s. Erick Erickson surely likes your e-mail domain :) — Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com http://www.lucidworks.com/ On Aug 11, 2015, at 9:02 PM, Scott Derrick sc...@tnstaafl.net wrote: I guess I really don't get Highlighting in Solr. We are transitioning from Google Custom Search which generally sucks, but does return nicely formatted highlighted fragment. I turn highlighting on hl=true in the query and I get a highlighting section returned at the bottom of the page, each identified by the document file name with a empty {} . It doesn't matter what I search for, plain text, a field, I get a list of documents followed by an empty brace? highlighting: { /home/scott/workspace/mbel-work/tei2html/build/web/./A10385B/A10385B.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./A10089/A10089.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./L3/L3.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./A10646/A10646.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./V03482/V03482.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./A10594/A10594.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./645A.66.043/645A.66.043.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./352.48.001/352.48.001.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./144.23.001/144.23.001.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./L18512/L18512.html: {} } I haven't made any changes to the default settings highlighting !-- Configure the standard fragmenter -- !-- This could most likely be commented out in the default case -- fragmenter name=gap default=true class=solr.highlight.GapFragmenter lst name=defaults int name=hl.fragsize100/int /lst /fragmenter !-- A regular-expression-based fragmenter (for sentence extraction) -- fragmenter name=regex class=solr.highlight.RegexFragmenter lst name=defaults !-- slightly smaller fragsizes work better because of slop -- int name=hl.fragsize70/int !-- allow 50% slop on fragment sizes -- float name=hl.regex.slop0.5/float !-- a basic sentence pattern -- str name=hl.regex.pattern[-\w ,/\n\quot;apos;]{20,200}/str /lst /fragmenter !-- Configure the standard formatter -- formatter name=html default=true class=solr.highlight.HtmlFormatter lst name=defaults str name=hl.simple.pre![CDATA[em]]/str str name=hl.simple.post![CDATA[/em]]/str /lst /formatter !-- Configure the standard encoder -- encoder name=html class=solr.highlight.HtmlEncoder / !-- Configure the standard fragListBuilder -- fragListBuilder name=simple class=solr.highlight.SimpleFragListBuilder/ !-- Configure the single fragListBuilder -- fragListBuilder name=single class=solr.highlight.SingleFragListBuilder/ !-- Configure the weighted fragListBuilder -- fragListBuilder name=weighted default=true class=solr.highlight.WeightedFragListBuilder/ !-- default tag FragmentsBuilder -- fragmentsBuilder name=default default=true class=solr.highlight.ScoreOrderFragmentsBuilder !-- lst name=defaults str name=hl.multiValuedSeparatorChar//str /lst -- /fragmentsBuilder !-- multi-colored tag FragmentsBuilder -- fragmentsBuilder name=colored class=solr.highlight.ScoreOrderFragmentsBuilder lst name=defaults str name=hl.tag.pre![CDATA[ b style=background:yellow,b style=background:lawgreen, b style=background:aquamarine,b style=background:magenta, b style=background:palegreen,b style=background:coral, b style=background:wheat,b style=background:khaki, b style=background:lime,b style=background:deepskyblue]]/str str name=hl.tag.post![CDATA[/b]]/str /lst /fragmentsBuilder boundaryScanner name=default default=true class=solr.highlight.SimpleBoundaryScanner lst name=defaults str name=hl.bs.maxScan10/str str name=hl.bs.chars.,!? #9;#10;#13;/str /lst /boundaryScanner boundaryScanner name=breakIterator class=solr.highlight.BreakIteratorBoundaryScanner lst name=defaults !-- type should be one of CHARACTER, WORD(default),
Choosing the order of the fields to be displayed at output
Hi, I'm using Solr 5.2.1. I understand that for JSON format, Solr writes out the fields of each document in the order they are found in the index as it is the fastest and most efficient for Solr to return the data. However, this causes confusion as each of the records has fields arranged in different order, as user are allowed to update the field after the document is index. Whenever a field is updated, that field will be displayed at the bottom of the record. Is there a way to choose the order of the fields to be displayed at the output, so that the order will be consistent for all the records? Regards, Edwin
Cross core join
I have a scenario(we are badly affected) where I have to join two cores of two different nodes. I knew that there is a jira (https://issues.apache.org/jira/plugins/servlet/mobile#issue/SOLR-7090) open in support of this, is there any alternate solution that I can work around until this gets released. Thanks, Sharath
Highlighting
I guess I really don't get Highlighting in Solr. We are transitioning from Google Custom Search which generally sucks, but does return nicely formatted highlighted fragment. I turn highlighting on hl=true in the query and I get a highlighting section returned at the bottom of the page, each identified by the document file name with a empty {} . It doesn't matter what I search for, plain text, a field, I get a list of documents followed by an empty brace? highlighting: { /home/scott/workspace/mbel-work/tei2html/build/web/./A10385B/A10385B.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./A10089/A10089.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./L3/L3.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./A10646/A10646.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./V03482/V03482.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./A10594/A10594.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./645A.66.043/645A.66.043.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./352.48.001/352.48.001.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./144.23.001/144.23.001.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./L18512/L18512.html: {} } I haven't made any changes to the default settings highlighting !-- Configure the standard fragmenter -- !-- This could most likely be commented out in the default case -- fragmenter name=gap default=true class=solr.highlight.GapFragmenter lst name=defaults int name=hl.fragsize100/int /lst /fragmenter !-- A regular-expression-based fragmenter (for sentence extraction) -- fragmenter name=regex class=solr.highlight.RegexFragmenter lst name=defaults !-- slightly smaller fragsizes work better because of slop -- int name=hl.fragsize70/int !-- allow 50% slop on fragment sizes -- float name=hl.regex.slop0.5/float !-- a basic sentence pattern -- str name=hl.regex.pattern[-\w ,/\n\quot;apos;]{20,200}/str /lst /fragmenter !-- Configure the standard formatter -- formatter name=html default=true class=solr.highlight.HtmlFormatter lst name=defaults str name=hl.simple.pre![CDATA[em]]/str str name=hl.simple.post![CDATA[/em]]/str /lst /formatter !-- Configure the standard encoder -- encoder name=html class=solr.highlight.HtmlEncoder / !-- Configure the standard fragListBuilder -- fragListBuilder name=simple class=solr.highlight.SimpleFragListBuilder/ !-- Configure the single fragListBuilder -- fragListBuilder name=single class=solr.highlight.SingleFragListBuilder/ !-- Configure the weighted fragListBuilder -- fragListBuilder name=weighted default=true class=solr.highlight.WeightedFragListBuilder/ !-- default tag FragmentsBuilder -- fragmentsBuilder name=default default=true class=solr.highlight.ScoreOrderFragmentsBuilder !-- lst name=defaults str name=hl.multiValuedSeparatorChar//str /lst -- /fragmentsBuilder !-- multi-colored tag FragmentsBuilder -- fragmentsBuilder name=colored class=solr.highlight.ScoreOrderFragmentsBuilder lst name=defaults str name=hl.tag.pre![CDATA[ b style=background:yellow,b style=background:lawgreen, b style=background:aquamarine,b style=background:magenta, b style=background:palegreen,b style=background:coral, b style=background:wheat,b style=background:khaki, b style=background:lime,b style=background:deepskyblue]]/str str name=hl.tag.post![CDATA[/b]]/str /lst /fragmentsBuilder boundaryScanner name=default default=true class=solr.highlight.SimpleBoundaryScanner lst name=defaults str name=hl.bs.maxScan10/str str name=hl.bs.chars.,!? #9;#10;#13;/str /lst /boundaryScanner boundaryScanner name=breakIterator class=solr.highlight.BreakIteratorBoundaryScanner lst name=defaults !-- type should be one of CHARACTER, WORD(default), LINE and SENTENCE -- str name=hl.bs.typeWORD/str !-- language and country are used when constructing Locale object. -- !-- And the Locale object will be used when getting instance of BreakIterator -- str name=hl.bs.languageen/str str name=hl.bs.countryUS/str
Re: Highlighting
bq: Erick Erickson surely likes your e-mail domain :) Yep, I envy that one! On Tue, Aug 11, 2015 at 6:27 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Scott - doesn’t look you’ve specified hl.fl specifying which field(s) to highlight. p.s. Erick Erickson surely likes your e-mail domain :) — Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com http://www.lucidworks.com/ On Aug 11, 2015, at 9:02 PM, Scott Derrick sc...@tnstaafl.net wrote: I guess I really don't get Highlighting in Solr. We are transitioning from Google Custom Search which generally sucks, but does return nicely formatted highlighted fragment. I turn highlighting on hl=true in the query and I get a highlighting section returned at the bottom of the page, each identified by the document file name with a empty {} . It doesn't matter what I search for, plain text, a field, I get a list of documents followed by an empty brace? highlighting: { /home/scott/workspace/mbel-work/tei2html/build/web/./A10385B/A10385B.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./A10089/A10089.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./L3/L3.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./A10646/A10646.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./V03482/V03482.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./A10594/A10594.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./645A.66.043/645A.66.043.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./352.48.001/352.48.001.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./144.23.001/144.23.001.html: {}, /home/scott/workspace/mbel-work/tei2html/build/web/./L18512/L18512.html: {} } I haven't made any changes to the default settings highlighting !-- Configure the standard fragmenter -- !-- This could most likely be commented out in the default case -- fragmenter name=gap default=true class=solr.highlight.GapFragmenter lst name=defaults int name=hl.fragsize100/int /lst /fragmenter !-- A regular-expression-based fragmenter (for sentence extraction) -- fragmenter name=regex class=solr.highlight.RegexFragmenter lst name=defaults !-- slightly smaller fragsizes work better because of slop -- int name=hl.fragsize70/int !-- allow 50% slop on fragment sizes -- float name=hl.regex.slop0.5/float !-- a basic sentence pattern -- str name=hl.regex.pattern[-\w ,/\n\quot;apos;]{20,200}/str /lst /fragmenter !-- Configure the standard formatter -- formatter name=html default=true class=solr.highlight.HtmlFormatter lst name=defaults str name=hl.simple.pre![CDATA[em]]/str str name=hl.simple.post![CDATA[/em]]/str /lst /formatter !-- Configure the standard encoder -- encoder name=html class=solr.highlight.HtmlEncoder / !-- Configure the standard fragListBuilder -- fragListBuilder name=simple class=solr.highlight.SimpleFragListBuilder/ !-- Configure the single fragListBuilder -- fragListBuilder name=single class=solr.highlight.SingleFragListBuilder/ !-- Configure the weighted fragListBuilder -- fragListBuilder name=weighted default=true class=solr.highlight.WeightedFragListBuilder/ !-- default tag FragmentsBuilder -- fragmentsBuilder name=default default=true class=solr.highlight.ScoreOrderFragmentsBuilder !-- lst name=defaults str name=hl.multiValuedSeparatorChar//str /lst -- /fragmentsBuilder !-- multi-colored tag FragmentsBuilder -- fragmentsBuilder name=colored class=solr.highlight.ScoreOrderFragmentsBuilder lst name=defaults str name=hl.tag.pre![CDATA[ b style=background:yellow,b style=background:lawgreen, b style=background:aquamarine,b style=background:magenta, b style=background:palegreen,b style=background:coral, b style=background:wheat,b style=background:khaki, b style=background:lime,b style=background:deepskyblue]]/str str name=hl.tag.post![CDATA[/b]]/str /lst /fragmentsBuilder boundaryScanner name=default default=true class=solr.highlight.SimpleBoundaryScanner lst name=defaults str name=hl.bs.maxScan10/str str name=hl.bs.chars.,!? #9;#10;#13;/str /lst /boundaryScanner boundaryScanner name=breakIterator