str name=stream_sizenull/str when using HttpSolrServer
Good day, I recently moved to solrj 3.6.1. As the CommonsHttpSolrServer class is deprecated in that version I migrated to HttpSolrServer. But now tika does not generate the stream_size field correctly, it is saying in the result response for an arbitrary jpeg file str name=stream_sizenull/str. Is there any known way to fix that? The extract handler is defined as: solrconfig.xml requestHandler name=/update/extract class=solr.extraction.ExtractingRequestHandler lst name=defaults str name=lowernamestrue/str str name=fmap.ownerfile_owner/str str name=fmap.pathfile_path/str /lst /requestHandler the field in schema.xml looks like that: field name=stream_size type=string indexed=true stored=true multiValued=false / Kind regards, Silvio
Solr / Velocity url rewrite
Hi all, Today i'm using solritas as front-end for the solr search engine. But i would like to do url rewriting to deliver urls more compliant with SEO. First the end user types that kind of url : http://host.com/query/myquery So this url should be rewriten internally (kind of reverse proxy) in http://localhost:8983/query?q=myquery. This internal url should not be displayed to the end user and in return when the result page is displayed all the links in the page should be rewritten with a SEO compliant url. I tried to perform some tests with an apache front end by using mod_proxy but i didn't succeed to pass url parameters. Does someone ever tried to do SEO with solr search engine (solritas front)? Thanks for your help.
RE: Continuous Ping query caused exception: java.util.concurrent.RejectedExecutionException
https://issues.apache.org/jira/browse/SOLR-4037 -Original message- From:Mark Miller markrmil...@gmail.com Sent: Sat 03-Nov-2012 14:24 To: solr-user@lucene.apache.org Subject: Re: Continuous Ping query caused exception: java.util.concurrent.RejectedExecutionException On Nov 1, 2012, at 5:39 AM, Markus Jelsma markus.jel...@openindex.io wrote: File bug? Please. - Mark
RE: SolrCloud indexing blocks if node is recovering
https://issues.apache.org/jira/browse/SOLR-4038 Still trying to gather the logs -Original message- From:Mark Miller markrmil...@gmail.com Sent: Sat 03-Nov-2012 14:17 To: Markus Jelsma markus.jel...@openindex.io Cc: solr-user@lucene.apache.org Subject: Re: SolrCloud indexing blocks if node is recovering The OOM machine and any surrounding if possible (eg especially the leader of the shard). Not sure what I'm looking for yet, so the more info the better. - Mark On Nov 3, 2012, at 5:23 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - yes, i should be able to make sense out of them next monday. I assume you're not too interested in the OOM machine but all surrounding nodes that blocked instead? -Original message- From:Mark Miller markrmil...@gmail.com Sent: Sat 03-Nov-2012 03:14 To: solr-user@lucene.apache.org Subject: Re: SolrCloud indexing blocks if node is recovering Doesn't sound right. Still have the logs? - Mark On Fri, Nov 2, 2012 at 9:45 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, We just tested indexing some million docs from Hadoop to a 10 node 2 rep SolrCloud cluster with this week's trunk. One of the nodes gave an OOM but indexing continued without interruption. When i restarted the node indexing stopped completely, the node tried to recover - which was unsuccessful. I restarted the node again but that wasn't very helpful either. Finally i decided to stop the node completely and see what happens - indexing resumed. Why or how won't the other nodes accept incoming documents when one node behaves really bad? The dying node wasn't the node we were sending documents to and we are not using CloudSolrServer yet (see other thread). Is this known behavior? Is it a bug? Thanks, Markus -- - Mark
Re: Where to get more documents or references about sold cloud?
LucidFind is a searchable archive of Solr documentation and email lists: http://find.searchhub.org/?q=solrcloud - Original Message - | From: Jack Krupansky j...@basetechnology.com | To: solr-user@lucene.apache.org | Sent: Monday, November 5, 2012 4:44:46 AM | Subject: Re: Where to get more documents or references about sold cloud? | | Is most of the Web blocked in your location? When I Google | SolrCloud, | Google says that there are About 61,400 results with LOTS of | informative | links, including blogs, videos, slideshares, etc. just on the first | two | pages pf search results alone. | | If you have specific questions, please ask them with specific detail, | but | try reading a few of the many sources of information available on the | Web | first. | | -- Jack Krupansky | | -Original Message- | From: SuoNayi | Sent: Monday, November 05, 2012 3:32 AM | To: solr-user@lucene.apache.org | Subject: Where to get more documents or references about sold cloud? | | Hi all, there is only one entry about solr cloud on the | wiki,http://wiki.apache.org/solr/SolrCloud. | I have googled a lot and found no more details about solr cloud, or | maybe I | miss something? | |
Re: Does SolrCloud supports MoreLikeThis?
The question you meant to ask is: Does MoreLikeThis support Distributed Search? and the answer apparently is no. This is the issue to get it working: https://issues.apache.org/jira/browse/SOLR-788 (Distributed Search is independent of SolrCloud.) If you want to make unit tests, that would really help- they won't work now but they will make it easier for someone to get the patch working again. Also, the patch will not get committed without unit tests. Lance - Original Message - | From: Luis Cappa Banda luisca...@gmail.com | To: solr-user@lucene.apache.org | Sent: Monday, November 5, 2012 7:54:59 AM | Subject: Re: Does SolrCloud supports MoreLikeThis? | | Thanks for the answer, Darren! I still have the hope that MLT is | supported | in the current version. An important feature of the product that I´m | developing depends on that, and even if I can emulate MLT with a | Dismax or | E-dismax component, the thing is that MLT fits and works perfectly... | | Regards, | | Luis Cappa. | | | 2012/11/5 Darren Govoni dar...@ontrenet.com | | There is a ticket for that with some recent activity (sorry I don't | have | it handy right now), but I'm not sure if that work made it into the | trunk, | so probably solrcloud does not support MLT...yet. Would love an | update from | the dev team though! | | brbrbr--- Original Message --- | On 11/5/2012 10:37 AM Luis Cappa Banda wrote:brThat´s the | question, :-) | br | brRegards, | br | brLuis Cappa. | br | |
GC stalls cause Zookeeper timeout during uninvert for facet field
Hi, We are running a small solr cluster with 8 cores on 4 machines. This database has about 1E9 very small documents. One of the statistics we need requires a facet on a text field with high cardinality. During the uninvert phase of this text field the searchers experience long stalls because of the garbage collecting (20+ seconds pauses) which causes Solr to lose the Zookeeper lease. Often they do not recover gracefully and as a result the cluster becomes degraded: SEVERE: There was a problem finding the leader in zk:org.apache.solr.common.SolrException: Could not get leader props This is an known open issue. I explored several options to try and work around this. However I'm new to Solr and need some help. We tried running more cores: We went from 4 to 8 cores. Does it make sense to go to 16 cores on 4 machines? GC tuning: This helped a lot but not enough to prevent the lease expirations. I'm by no means a Java GC expert and would appreciate any tips to improve this further. Current settings are: Java HotSpot(TM) 64-Bit Server VM (20.0-b11) -Xloggc:/home/solr/solr/log/gc.log -XX:+PrintGCApplicationStoppedTime -XX:+PrintTenuringDistribution -XX:+PrintClassHistogram -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=75 -XX:MaxGCPauseMillis=1 -XX:+CMSIncrementalMode -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Djava.awt.headless=true -Xss256k -Xmx18g -Xms1g -DzkHost=ds30:2181,ds31:2181,ds32:2181 Actual memory stats accoring to top are: 74GB virtual, 11GB resident. The GC log shows: - age 1: 39078968 bytes, 39078968 total : 342633K-38290K(345024K), 24.7992520 secs] 9277535K-9058682K(11687832K) icms_dc=73 , 24.7993810 secs] [Times: user=366.87 sys=26.31, real=24.79 secs] Total time for which application threads were stopped: 24.8005790 seconds 975.478: [GC 975.478: [ParNew Desired survivor size 19628032 bytes, new threshold 1 (max 4) - age 1: 38277672 bytes, 38277672 total : 343750K-37537K(345024K), 22.4217640 secs] 9364142K-9131962K(11687832K) icms_dc=73 , 22.4218650 secs] [Times: user=331.25 sys=23.85, real=22.42 secs] Total time for which application threads were stopped: 22.4231750 seconds etc. Solr version: 4.0.0.2012.10.06.03.04.33 Current hardware consists of 4 machines, of which each has: 2x E5645 CPU, total of 24 cores 48GB mem 8 x SATA 7200RPM in raid 10 What would be a good strategy to try and get this database to perform the way we need it? Would it make sense to split it up into 16 shards? Ways to improve the GC behavior? Any help would be grately appreciated. AJ -- Arend-Jan Wijtzes -- Wiseguys -- www.wise-guys.nl
Re: SolrCloud - configuration management in ZooKeeper
Hi Alexey, responses are inline: Zookeeper manages not only the cluster state, but also the common configuration files. My question is, what are the exact rules of precedence? That is, when SOLR node will decide to download new configuration files? When the SolrCore is started. Will configuration files be updated from ZooKeeper every time the core is refreshed? Yes, every time the SolrCore is reloaded. If you need to force this, you can either reload all the cores or reload the collection: https://issues.apache.org/jira/browse/SOLR-3488 What if bootstrapping is defined (bootstrap_configdir)? Will the node always try to upload? if bootstrap_confdir is set, and the config name is always the same, every time you start Solr it will upload the configuration files and override the old ones in the same zk location. What are the best practices for production environment? Is it better to use external tool (ZkCLI) to trigger configuration changes? I would at least not attach the bootstrap_confdir to a start script and make it explicit. There are some Solr specific zk scripts that you can use. See http://wiki.apache.org/solr/SolrCloud#Getting_your_Configuration_Files_into_ZooKeeper I would use Solr's zk script for managing the configuration. Tomás Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-configuration-management-in-ZooKeeper-tp4018432.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr4 data import skipdoc and regex
Hi *, I want to import some data to build a Solr index. For this import, I need to skip some documents from importing. In my data-config file it looks like this: field column=$skipDoc regex=^MyPattern .* replaceWith=true sourceColName=text/ As I also need to search my 'titles' I tried this: field column=$skipDoc regex=^MyPattern .* replaceWith=true sourceColName=text/ field column=$skipDoc regex=^MyPattern2 .* replaceWith=true sourceColName=title/ This couldn't work - thats now clear for me ;-) But how can I do it? Thanks in advance :-) Randy -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-data-import-skipdoc-and-regex-tp4018495.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Add new shard will be treated as replicas in Solr4.0?
bq: where can i find all the items on the road map? Well, you really can't G... There's no official roadmap. I happen to know this since I follow the developer's list and I've seen references to this being important to the folks doing SolrCloud development work and it's been a recurring theme on the user's list. It's one of those things that _everybody_ understands would be useful in certain circumstances, but haven't had time to actually implement yet. You can track this at: https://issues.apache.org/jira/browse/SOLR-2592 Best Erick On Mon, Nov 5, 2012 at 7:57 PM, Zeng Lames lezhi.z...@gmail.com wrote: btw, where can i find all the items in the road map? thanks! On Tue, Nov 6, 2012 at 8:55 AM, Zeng Lames lezhi.z...@gmail.com wrote: hi Erick, thanks for your kindly response. hv got the information from the SolrCloud wiki. think we may need to defined the shard numbers when we really rollout it. thanks again On Mon, Nov 5, 2012 at 8:40 PM, Erick Erickson erickerick...@gmail.com wrote: Not at present. What you're interested in is shard splitting which is certainly on the roadmap but not implemented yet. To expand the number of shards you'll have to reconfigure, then re-index. Best Erick On Mon, Nov 5, 2012 at 4:09 AM, Zeng Lames lezhi.z...@gmail.com wrote: Dear All, we have an existing solr collection, 2 shards, numOfShard is 2. and there are already records in the index files. now we start another solr instance with ShardId= shard3, and found that Solr treat it as replicas. check the zookeeper data, found the range of shard doesn't change correspondingly. shard 1 is 0-7fff, while shard 2 is 8000-. is there any way to increase new shard for existing collection? thanks a lot! Lames
Re: How to re-read the config files in Solr, on a commit
Not that I know of. This would be extremely expensive in the usual case. Loading up configs, reconfiguring all the handlers etc. would add a huge amount of overhead to the commit operation, which is heavy enough as it is. What's the use-case here? Changing your configs really often and reading them on commit sounds like a way to make for a very confusing application! But if you really need to re-read all this info on a running system, consider the core admin RELOAD command. Best Erick On Mon, Nov 5, 2012 at 8:43 PM, roz dev rozde...@gmail.com wrote: Hi All I am keen to find out if Solr exposes any event listener or other hooks which can be used to re-read configuration files. I know that we have firstSearcher event but I am not sure if it causes request handlers to reload themselves and read the conf files again. For example, if I change the synonym file and solr gets a commit, will it re-initialize request handlers and re-read the conf files. Or, are there some events which can be listened to? Any inputs are welcome. Thanks Saroj
Searching for Partial Words
Hi, Given following values in the document: Doc1: Engine Doc2. Engineer Doc3. ResidentEngineer We need to return all three documents when someone searches for engi. Basically we need to implement partial word search. Currently, we have a wild card on the right side of search term (term*). Is it possible to have wild card on both sides of a search term? Regards, Sohail Aboobaker.
Re: load balance with SolrCloud
I think you're conflating shards and cores. Shards are physical slices of a singe logical index. An incoming query is sent to each and every shard and the results tallied. The case you're talking about seems to be more you have N separate indexes (cores), where each core is for a specific user. This is vastly different from SolrCloud, which puts all the data into one huge logical index! Furthermore, presently there's no way to direct specific documents to specific shards in SolrCloud (although a pluggable sharding mechanism is under development). You might be interested in SOLR-1293 (under development) for managing lots of cores. On Mon, Nov 5, 2012 at 4:26 PM, Jie Sun jsun5...@yahoo.com wrote: we are using solr 3.5 in production and we deal with customers data of terabytes. we are using shards for large customers and write our own replica management in our software. Now with the rapid growth of data, we are looking into solrcloud for its robustness of sharding and replications. I understand by read some documents on line that there is no SPOF using solrcloud, so any instance in the cluster can server the query/index. However, is it true that we need to write our own load balancer in front of solrCloud? For example if we want to implement a model similar to Loggly, i.e. each customer start indexing into the small shard of its own, then if any of the customers grow more than the small shard's limit, we switch to index into another small shard (we call it front end shard), meanwhile merge the just released small shard to next level larger shard. Since the merge can happen between two instances on different servers, we probably end up with synch the index files for the merging shards and then use solr merge. I am curious if there is anything solr provide to help on these kind of strategy dealing with unevenly grow big customer data (a core)? or do we have to write these in our software layer from scratch? thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/load-balance-with-SolrCloud-tp4018367.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr / Velocity url rewrite
Velocity/Solaritas was never intended to be a user-facing app. How are you locking things down so a user can't enter, or instance, q=deletequery*:*/query/deletecommit=true? I'd really recommend a proper middleware layer unless you have a trusted user base... FWIW, Erick On Tue, Nov 6, 2012 at 4:20 AM, Sébastien Dartigues sebastien.dartig...@gmail.com wrote: Hi all, Today i'm using solritas as front-end for the solr search engine. But i would like to do url rewriting to deliver urls more compliant with SEO. First the end user types that kind of url : http://host.com/query/myquery So this url should be rewriten internally (kind of reverse proxy) in http://localhost:8983/query?q=myquery. This internal url should not be displayed to the end user and in return when the result page is displayed all the links in the page should be rewritten with a SEO compliant url. I tried to perform some tests with an apache front end by using mod_proxy but i didn't succeed to pass url parameters. Does someone ever tried to do SEO with solr search engine (solritas front)? Thanks for your help.
Re: Solr / Velocity url rewrite
Hi Erick, Thanks for your help. OK except the php client delivered as a sample, do you have a preference for an out of the box front end easly deployable? My main use case is to be compliant with SEO, or at least to give nice (url) entry point. Thanks. 2012/11/6 Erick Erickson erickerick...@gmail.com Velocity/Solaritas was never intended to be a user-facing app. How are you locking things down so a user can't enter, or instance, q=deletequery*:*/query/deletecommit=true? I'd really recommend a proper middleware layer unless you have a trusted user base... FWIW, Erick On Tue, Nov 6, 2012 at 4:20 AM, Sébastien Dartigues sebastien.dartig...@gmail.com wrote: Hi all, Today i'm using solritas as front-end for the solr search engine. But i would like to do url rewriting to deliver urls more compliant with SEO. First the end user types that kind of url : http://host.com/query/myquery So this url should be rewriten internally (kind of reverse proxy) in http://localhost:8983/query?q=myquery. This internal url should not be displayed to the end user and in return when the result page is displayed all the links in the page should be rewritten with a SEO compliant url. I tried to perform some tests with an apache front end by using mod_proxy but i didn't succeed to pass url parameters. Does someone ever tried to do SEO with solr search engine (solritas front)? Thanks for your help.
Re: Searching for Partial Words
Add an edge n-gram filter (EdgeNGramFilterFactory) to your index analyzer. This will add all the prefixes of words to the index, so that a query of engi will be equivalent to but much faster than the wildcard engi*. You can specify a minimum size, such as 3 or 4 to eliminate tons of too-short prefixes, if you want. See: http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramFilterFactory.html http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenFilter.html -- Jack Krupansky -Original Message- From: Sohail Aboobaker Sent: Tuesday, November 06, 2012 8:08 AM To: solr-user@lucene.apache.org Subject: Searching for Partial Words Hi, Given following values in the document: Doc1: Engine Doc2. Engineer Doc3. ResidentEngineer We need to return all three documents when someone searches for engi. Basically we need to implement partial word search. Currently, we have a wild card on the right side of search term (term*). Is it possible to have wild card on both sides of a search term? Regards, Sohail Aboobaker.
Re: Solr 4.0 simultaneous query problem
So is it a better approach to query for smaller rows, say 500, and keep increasing the start parameter? wouldnt that be slower since I have an increasing start parameter and I will also be sorting by the same field in each of my queries made to the multiple shards? Also, does it make sense to have all these documents in the same shard? I went for this approach because the shard which is queried the most is small and gives a lot of benefit in terms of time taken for all the stats queries. This shard is only about 5 gb whereas the entire index will be about 50 gb. Thanks for the help, Rohit On Mon, Nov 5, 2012 at 4:02 PM, Walter Underwood wun...@wunderwood.orgwrote: Don't query for 5000 documents. That is going to be slow no matter how it is implemented. wunder On Nov 5, 2012, at 1:00 PM, Rohit Harchandani wrote: Hi, So it seems that when I query multiple shards with the sort criteria for 5000 documents, it queries all shards and gets a list of document ids and then adds the document ids to the original query and queries all the shards again. This process of doing the join of query results with the unique ids and getting the remaining fields is turning out to be really slow. It takes a while to search for a list of unique ids. Is there any config change to make this process faster? Also what does isDistrib=false mean when solr generates the queries internally? Thanks, Rohit On Fri, Oct 19, 2012 at 5:23 PM, Rohit Harchandani rhar...@gmail.com wrote: Hi, The same query is fired always for 500 rows. The only thing different is the start parameter. The 3 shards are in the same instance on the same server. They all have the same schema. But the inherent type of the documents is different. Also most of the apps queries goes to shard A which has the smallest index size (4gb). The query is made to a master shard which by default goes to all 3 shards for results. (also, the query that i am trying matches documents only only in shard A mentioned above) Will try debugQuery now and post it here. Thanks, Rohit On Thu, Oct 18, 2012 at 11:00 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Maybe you can narrow this down a little further. Are there some queries that are faster and some slower? Is there a pattern? Can you share examples of slow queries? Have you tried debugQuery=true? These 3 shards is each of them on its own server or? Is the slow one always the one that hits the biggest shard? Do they hold the same type of data? How come their sizes are so different? Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Thu, Oct 18, 2012 at 12:22 PM, Rohit Harchandani rhar...@gmail.com wrote: Hi all, I have an application which queries a solr instance having 3 shards(4gb, 13gb and 30gb index size respectively) having 6 million documents in all. When I start 10 threads in my app to make simultaneous queries (with rows=500 and different start parameter, sort on 1 field and no facets) to solr to return 500 different documents in each query, sometimes I see that most of the responses come back within no time (500ms-1000ms), but the last response takes close to 50 seconds (Qtime). I am using the latest 4.0 release. What is the reason for this delay? Is there a way to prevent this? Thanks and regards, Rohit -- Walter Underwood wun...@wunderwood.org
Re: SolrCloud failover behavior
Thanks a million, Erick! You're right about killing both nodes hosting the shard. I'll get the wiki corrected. Nick On 11/3/2012 10:51 PM, Erick Erickson wrote: SolrCloud doesn't work unless every shard has at least one server that is up and running. I _think_ you might be killing both nodes that host one of the shards. The admin page has a link showing you the state of your cluster. So when this happens, does that page show both nodes for that shard being down? And yeah, SolrCloud requires a quorum of ZK nodes up. So with only one ZK node, killing that will bring down the whole cluster. Which is why the usual recommendation is that ZK be run externally and usually an odd number of ZK nodes (three or more). Anyone can create a login and edit the Wiki, so any clarifications are welcome! Best Erick On Sat, Nov 3, 2012 at 12:17 PM, Nick Chase nch...@earthlink.net wrote: I think there's a change in the behavior of SolrCloud vs. what's in the wiki, but I was hoping someone could confirm for me. I checked JIRA and there were a couple of issues requesting partial results if one server comes down, but that doesn't seem to be the issue here. I also checked CHANGES.txt and don't see anything that seems to apply. I'm running Example B: Simple two shard cluster with shard replicas from the wiki at https://wiki.apache.org/solr/**SolrCloudhttps://wiki.apache.org/solr/SolrCloudand everything starts out as expected. However, when I get to the part about fail over behavior is when things get a little wonky. I added data to the shard running on 7475. If I kill 7500, a query to any of the other servers works fine. But if I kill 7475, rather than getting zero results on a search to 8983 or 8900, I get a 503 error: response lst name=responseHeader int name=status503/int int name=QTime5/int lst name=params str name=q*:*/str /lst /lst lst name=error str name=msgno servers hosting shard:/str int name=code503/int /lst /response I don't see any errors in the consoles. Also, if I kill 8983, which includes the Zookeeper server, everything dies, rather than just staying in a steady state; the other servers continually show: Nov 03, 2012 11:39:34 AM org.apache.zookeeper.**ClientCnxn$SendThread startConnect NFO: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:9983 ov 03, 2012 11:39:35 AM org.apache.zookeeper.**ClientCnxn$SendThread run ARNING: Session 0x13ac6cf87890002 for server null, unexpected error, closing socket connection and attempting reconnect ava.net.ConnectException: Connection refused: no further information at sun.nio.ch.SocketChannelImpl.**checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.**finishConnect(Unknown Source) at org.apache.zookeeper.**ClientCnxn$SendThread.run(** ClientCnxn.java:1143) ov 03, 2012 11:39:35 AM org.apache.zookeeper.**ClientCnxn$SendThread startConnect over and over again, and a call to any of the servers shows a connection error to 8983. This is the current 4.0.0 release, running on Windows 7. If this is the proper behavior and the wiki needs updating, fine; I just need to know. Otherwise if anybody has any clues as to what I may be missing, I'd be grateful. :) Thanks... --- Nick
Re: lukeall.jar for Solr4r?
Thank you very much for taking the time to do this. This version is able to read the index files, but there is at least one issue: The home screen reports ERROR: can't count terms per field and this exception is thrown: java.util.NoSuchElementException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1098) at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.getopt.luke.IndexInfo.countTerms(IndexInfo.java:64) at org.getopt.luke.IndexInfo.getNumTerms(IndexInfo.java:109) at org.getopt.luke.Luke$3.run(Luke.java:1165) On 11/05/2012 05:08 PM, Shawn Heisey wrote: On 11/5/2012 2:52 PM, Shawn Heisey wrote: No idea whether I did it right, or even whether it works. All my indexes are either 3.5 or 4.1-SNAPSHOT, so I can't actually test it. You can get to the resulting jar and my patch against the luke-4.0.0-ALPHA source: https://dl.dropbox.com/u/97770508/luke-4.0.0-unofficial.patch https://dl.dropbox.com/u/97770508/lukeall-4.0.0-unofficial.jar If you have an immediate need for 4.0.0 support in Luke, please try it out and let me know whether it works. If it doesn't work, or when the official luke 4.0.0 is released, I will remove those files from my dropbox. I just realized that the version I uploaded there was compiled with java 1.7.0_09. I don't know if this is actually a problem, but just in case, I re-did the compile on a machine with 1.6.0_29. The filename referenced above now points to this version and I have included a file that indicates its java7 origins: https://dl.dropbox.com/u/97770508/lukeall-4.0.0-unofficial-java7.jar Thanks, Shawn
custom request handler
Hi we are extending SearchHandler to provide a custom search request handler. Basically we've added NamedLists called allowed , whiteList, maxMinList etc. These look like the default, append and invariant namedLists in the standard search handler config. In handleRequestBody we then remove params not listed in the allowed named list, white list values as per the white list and so on. The idea is to have a safe request handler which the big bad world could be exposed to. I'm worried. What have we missed that a front end app could give us ? Also removing params in SolrParams is a bit clunky. We are basically converting SolrParams into NamedList processing a new NamedList from this and then .setParams(SolrParams.toSolrParams(nlNew)) Is their a better way? In particular namedLists are not set up for key look ups... Anyway basically is having a custom request handler doing the above the way to go ? Cheers
Re: Searching for Partial Words
Thanks Jack. In the configuration below: fieldType name=text_edgngrm class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.EdgeNGramTokenizerFactory side=front minGramSize=1 maxGramSize=1/ /analyzer /fieldType What are the possible values for side? If I understand it correctly, minGramSize=3 and side=front, will include eng* but not en*. Is this correct? So, the minGramSize is for number of characters allowed in the specified side. Does it allow side=both :) or something similar? Regards, Sohail
Re: migrating from solr3 to solr4
I got the following error in browser console: http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json We can't see the contents of that link.. Could you post it on pastebin.com or something? Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Tue, Nov 6, 2012 at 8:35 AM, Carlos Alexandro Becker caarl...@gmail.com wrote: I got the following error in browser console: http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json
Re: migrating from solr3 to solr4
Hi Michael, thank for your answer. I already posted it in stackoverflow ( http://stackoverflow.com/questions/13236383/migrating-from-solr3-to-solr4 ), but, this looks like a encoding issue, actually, is exactly the error. I'm not sure, but I look in all xml files in my JBoss and also in app, neither mention this variables (contextPath and adminPath) related to solr. So, or there is something that I should configure and don't know how, or some trouble with the encoding that are escaping the $ and { around the var (not sure, I didn't find the file where the app variable is populated). Thanks in advance. On Tue, Nov 6, 2012 at 1:49 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: I got the following error in browser console: http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json We can't see the contents of that link.. Could you post it on pastebin.com or something? Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Tue, Nov 6, 2012 at 8:35 AM, Carlos Alexandro Becker caarl...@gmail.com wrote: I got the following error in browser console: http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json -- Atenciosamente, *Carlos Alexandro Becker* https://profiles.google.com/caarlos0
Re: SolrCloud Tomcat configuration: problems and doubts.
Forward to solr-user mailing list. We forgot to reply to it, :-/ 2012/11/5 Luis Cappa Banda luisca...@gmail.com Hello, Mark! I´ve been testing more and more and things are going better. I have tested what you told me about -Dbootstrap_conf=true and works fine, but the problem is that if I include that application parameter in every Tomcat instance when I deploy all Solr servers each one load again all solrCore configurations inside Zookeeper. It should exists something like a Tomcat master server which only has the following parameters that defines the basic SolrCloud configuration: JAVA_OPTS=-DzkHost=127.0.0.1:9000 -DnumShards=2 -Dbootstrap_conf=true Then the other Tomcat servers should have only: JAVA_OPTS=-DzkHost=127.0.0.1:9000 However, I think that is not the best way to procceed. We are at 2012, it´s the end of the world - God (well, one of them) is angry and attacks my Production environment. Imagine that all servers go down and a Monit service restarts them alleatory. Maybe one common Tomcat server finishes it´s startup faster than the named Tomcat master server, so those SolrCloud configuration parameters won´t be loaded at first. That´s a problem. One posibility is to write a simple script to be executed in every Tomcat launch execution that consists on something like: I´m the first Tomcat and I´m launching! I´ll write a solrcloud.config.lock file in a well-known path (or maybe into Zookeeper) to announce the other Tomcats that I´ll start to load SolrCloud configuration files into Zookeeper. I am the Tomcat master server, so I´ll load* JAVA_OPTS=-DzkHost=127.0.0.1:9000 -DnumShards=2 -Dbootstrap_conf=true* . I´m a second Tomcat and I´m launching! First I check if any solrcloud.config.lock file exists. If exists, I simple load * JAVA_OPTS=-DzkHost=127.0.0.1:9000* And so on. I don´t like too much this solution because it´s not elegant and it´s very ad-hoc, but it works. What do you think about it? I´ve just started with SolrCloud four or five days ago and maybe I forget something that could solve this problem. Thank you very much, Mark. Regards, Luis Cappa. 2012/11/3 Mark Miller markrmil...@gmail.com On Fri, Nov 2, 2012 at 9:05 AM, Luis Cappa Banda luisca...@gmail.com wrote: Hello, Mark! How are you? Thanks a lot for helping me. You were right about jetty.host parameter. My fianl test solr.xml looks like: cores adminPath=/admin/cores defaultCoreName=items_en host=localhost hostPort=9080 hostContext=items_en core name=items_en instanceDir=items_en / /cores I´ve noticed that 'hostContext' parameter was also required, so I included it. It should default to /solr if you don't set it - it is there in case you deploy to a different context though. After that corrections Cloud graph tree looks right, and executing queries doesn' t return a 503 error. Phew! However, I checked in the Cloud graph tree that acollection1 appears too pointing to http://localhost:8983/solr. I will continue testing if I missed something, but looks like it is creating another collection with default parameters (collection name, port) without control. It should only create what it finds in solr.xml - let me know what you find. While using Apache Tomcat I was forced to include in catalina.sh (or setenv.sh) the following environment parameters, as I told you before: JAVA_OPTS=-DzkHost=127.0.0.1:9000 -Dcollection.configName=items_en You should only need -DzkHost= - see below. Just three questions more: 1. That´s a problem for me, because I would like to deploy in each Tomcat instance more than one Solr server with different configurations file (I mean, differents configName parameters), so including that JAVA_OPTS forces to me to deploy in that Tomcat server only Solr servers with this kind of configuration. In a production environment I would like to deploy in a single Tomcat instance at least for Solr servers, one per each kind of documents that I will index and query to. Do you know any way to configure the configName per each Solr server instance? Is it posible to configure it inside solr.xml file? Also, it make sense to deploy in each Solr server a multi-core configuration, each core with each configName allocated in Zookeeper, but again using that kind of JAVA_OPTS on-fire params configuration makes it impossible, :-( That config name sys prop is not being used here - it's only used when you use -Dbootstrap_confdir=path, and then only the first time you start up. Collections are linked to configuration sets in ZooKeeper. If you use -Dboostrap_conf=true, a special rule is used that auto links collections and config sets with the same name as the collection. Otherwise, you can use the ZkCLi cmd line tool to link any collectio to any config in zookeeper. 2. The other question is about indexing. What is the best way to plain index (I
Re: migrating from solr3 to solr4
Hey Carlos just had a quick look at our changes and figured out the revision which introduced this change, which might help you while having another look? http://svn.apache.org/viewvc?view=revisionrevision=1297578 The LoadAdminUiServlet is responsible for replacing those placeholders which are causing your problems HTH at least a bit Stefan On Tuesday, November 6, 2012 at 5:02 PM, Carlos Alexandro Becker wrote: just found this in the admin.html head: https://gist.github.com/4025669 On Tue, Nov 6, 2012 at 1:57 PM, Carlos Alexandro Becker caarl...@gmail.com (mailto:caarl...@gmail.com)wrote: Hi Michael, thank for your answer. I already posted it in stackoverflow ( http://stackoverflow.com/questions/13236383/migrating-from-solr3-to-solr4 ), but, this looks like a encoding issue, actually, is exactly the error. I'm not sure, but I look in all xml files in my JBoss and also in app, neither mention this variables (contextPath and adminPath) related to solr. So, or there is something that I should configure and don't know how, or some trouble with the encoding that are escaping the $ and { around the var (not sure, I didn't find the file where the app variable is populated). Thanks in advance. On Tue, Nov 6, 2012 at 1:49 PM, Michael Della Bitta michael.della.bi...@appinions.com (mailto:michael.della.bi...@appinions.com) wrote: I got the following error in browser console: http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json We can't see the contents of that link.. Could you post it on pastebin.com (http://pastebin.com) or something? Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com (http://www.appinions.com) Where Influence Isn’t a Game On Tue, Nov 6, 2012 at 8:35 AM, Carlos Alexandro Becker caarl...@gmail.com (mailto:caarl...@gmail.com) wrote: I got the following error in browser console: http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json -- Atenciosamente, *Carlos Alexandro Becker* https://profiles.google.com/caarlos0 -- Atenciosamente, *Carlos Alexandro Becker* https://profiles.google.com/caarlos0
Re: lukeall.jar for Solr4r?
On 11/6/2012 7:45 AM, Carrie Coy wrote: Thank you very much for taking the time to do this. This version is able to read the index files, but there is at least one issue: The home screen reports ERROR: can't count terms per field and this exception is thrown: java.util.NoSuchElementException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1098) at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010) at org.getopt.luke.IndexInfo.countTerms(IndexInfo.java:64) at org.getopt.luke.IndexInfo.getNumTerms(IndexInfo.java:109) at org.getopt.luke.Luke$3.run(Luke.java:1165) That particular change, around IndexInfo.java line 64 (and a few other locations as well), is the one part of my changes that I actually had confidence in. I have no idea how to fix it. I'll go ahead and remove the jars from my dropbox, since they don't work. Thanks, Shawn
Re: GC stalls cause Zookeeper timeout during uninvert for facet field
On Nov 6, 2012 at 6:06 AM, Arend-Jan Wijtzes ajwyt...@wise-guys.nlmailto:ajwyt...@wise-guys.nl wrote: ... During the uninvert phase of this text field the searchers experience long stalls because of the garbage collecting (20+ seconds pauses) which causes Solr to lose the Zookeeper lease. Often they do not recover gracefully and as a result the cluster becomes degraded: SEVERE: There was a problem finding the leader in zk:org.apache.solr.common.SolrException: Could not get leader props This is an known open issue. warning: commercial product mention follows Using the Zing JVM is simple, immediate way to get around this and other known GC related issues. Zing eliminates GC pauses as a concern for enterprise applications such as this, driving worst case JVM-related hiccups down to the milliseconds level. This behavior will tend to happen out-of-the-box, with little or no tuning, and at any heap size your server can support. For example, on the specific serverconfigurations you mention (24 vcores, 48GB of RAM) you should be able to comfortably run with a -Xmx of 30GB and no longer worry about pauses. We've had people run much larger than that (e.g. http://blog.mikemccandless.com/2012/07/lucene-index-in-ram-with-azuls-zing-jvm.html). In full disclosure, I work for (and am the CTO at) Azul. -- Gil.
Re: migrating from solr3 to solr4
Hi Stefan, Thank you very much, I just realized that I didn't updated the web.xml, so, I not has the LoadAdminUiServlet configured, that's why it was not working. By now, the only problem I still have, is that it tries to access solr.home/collection1/conf, and I used to have it in solr.home/conf.. How can I fix this? Thank you very much for your help. On Tue, Nov 6, 2012 at 3:01 PM, Stefan Matheis matheis.ste...@gmail.comwrote: Hey Carlos just had a quick look at our changes and figured out the revision which introduced this change, which might help you while having another look? http://svn.apache.org/viewvc?view=revisionrevision=1297578 The LoadAdminUiServlet is responsible for replacing those placeholders which are causing your problems HTH at least a bit Stefan On Tuesday, November 6, 2012 at 5:02 PM, Carlos Alexandro Becker wrote: just found this in the admin.html head: https://gist.github.com/4025669 On Tue, Nov 6, 2012 at 1:57 PM, Carlos Alexandro Becker caarl...@gmail.com (mailto:caarl...@gmail.com)wrote: Hi Michael, thank for your answer. I already posted it in stackoverflow ( http://stackoverflow.com/questions/13236383/migrating-from-solr3-to-solr4), but, this looks like a encoding issue, actually, is exactly the error. I'm not sure, but I look in all xml files in my JBoss and also in app, neither mention this variables (contextPath and adminPath) related to solr. So, or there is something that I should configure and don't know how, or some trouble with the encoding that are escaping the $ and { around the var (not sure, I didn't find the file where the app variable is populated). Thanks in advance. On Tue, Nov 6, 2012 at 1:49 PM, Michael Della Bitta michael.della.bi...@appinions.com (mailto: michael.della.bi...@appinions.com) wrote: I got the following error in browser console: http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json We can't see the contents of that link.. Could you post it on pastebin.com (http://pastebin.com) or something? Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com (http://www.appinions.com) Where Influence Isn’t a Game On Tue, Nov 6, 2012 at 8:35 AM, Carlos Alexandro Becker caarl...@gmail.com (mailto:caarl...@gmail.com) wrote: I got the following error in browser console: http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json -- Atenciosamente, *Carlos Alexandro Becker* https://profiles.google.com/caarlos0 -- Atenciosamente, *Carlos Alexandro Becker* https://profiles.google.com/caarlos0 -- Atenciosamente, *Carlos Alexandro Becker* http://caarlos0.github.com/about
Re: Reply:Re: Where to get more documents or references about sold cloud?
Hi, On Mon, Nov 5, 2012 at 8:24 PM, SuoNayi suonayi2...@163.com wrote: Thanks jack and thanks for the great country. All big famous websites such as google, slideshares and blogspot etc are blocked. What I want to know about is more details about solrcloud, here is my questions: 1.Can we control the relocation of shard / replica dynamically? Don't think so, if you think manually. 2.Can we move shard between solr instances? SolrCloud does this, there is no manual moving option now. 3.Is one solr instance related to one shard / replica? A single shard or a single replica of a shard lives on just 1 Solr server. A replica of a shard on server A can and should be on server B. 4.What's the sharding key algorithm? Hashing on the doc key and # of nodes, I believe. 5.Does it support custom sharding key? Not yet, I believe. See http://search-lucene.com/?q=cloud+shardingfc_project=Solrfc_type=mail+_hash_+userfc_type=jira Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html At 2012-11-05 20:44:46,Jack Krupansky j...@basetechnology.com wrote: Is most of the Web blocked in your location? When I Google SolrCloud, Google says that there are About 61,400 results with LOTS of informative links, including blogs, videos, slideshares, etc. just on the first two pages pf search results alone. If you have specific questions, please ask them with specific detail, but try reading a few of the many sources of information available on the Web first. -- Jack Krupansky -Original Message- From: SuoNayi Sent: Monday, November 05, 2012 3:32 AM To: solr-user@lucene.apache.org Subject: Where to get more documents or references about sold cloud? Hi all, there is only one entry about solr cloud on the wiki,http://wiki.apache.org/solr/SolrCloud. I have googled a lot and found no more details about solr cloud, or maybe I miss something?
New Index directory regardless of Solr.xml
I have a five node SolrCloud implementation running as a test with no replication using a three node zookeeper ensemble. Admittedly, I'm new to Solr and just grinding it out. Accidently re-initialized zookeeper with the wrong conf dir and I'm trying to recover. I re-ran the initialization with the correct conf dir, but now the indexes are reporting 0 documents. Logs also report that a new index was created in the dataDir called index. Previous indexes where in a named directory based on slice/shard. The previous indexes don't appear to have any issues, I just can't re-point the solr cores to them. The Solr.xml file for one of the servers is: ?xml version=1.0 encoding=UTF-8 ? solr persistent=true cores adminPath=/admin/cores hostPort=8502 core schema=schema.xml shard=slice5 instanceDir=test_s5s1/ name=twitter_s5s1 config=solrconfig.xml collection=test/ /cores /solr I think I'm missing exactly what the instanceDir provides. These were the directories created when I first set up the servers and where the indexes exist that I want to use. Any thought? Or am I just completely off base here in my description of the issue. Chris
RE: Access DIH from inside application (via Solrj)?
DIH SolrJ don't really support what you want to do. But you can make it work with code like this, which reloads the DIH configuration and checks for the response. Just note this is quite brittle: whenever the response changes in future versions of DIH, it'll break your code. MapString, String paramMap = new HashMapString, String(); paramMap.put(command, reload-config); SolrParams params = new MapSolrParams(paramMap); DirectXmlRequest req = new DirectXmlRequest(/dataimporthandler, null); req.setMethod(METHOD.GET); req.setParams(params); NamedListObject nl = server.request(req); String importResponse = (String) nl.get(importResponse); boolean reloaded = false; if(Configuration Re-loaded sucessfully.equals(importResponse)) { reloaded = true; } James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Billy Newman [mailto:newman...@gmail.com] Sent: Tuesday, November 06, 2012 3:00 PM To: solr-user@lucene.apache.org Subject: Access DIH from inside application (via Solrj)? I know that you can access the DIh interface restfully which work pretty well for most cases. I would like to know however if it is possible to send/receive commands from a DIH vai the Solrj library. Basically I would just like to be able to kick off the DIH and maybe check status. I can work around this but java is not the best client for handling/dealing with http/xml. If I could use Solrj the code would probably be a lot more straight forward. Thanks guys/gals, Billy
Re: Solr / Velocity url rewrite
Not really. Mostly it's whatever you are most comfortable with. Since the app - solr connection is just HTTP, the front-end is wide open. FWIW, Erick On Tue, Nov 6, 2012 at 8:30 AM, Sébastien Dartigues sebastien.dartig...@gmail.com wrote: Hi Erick, Thanks for your help. OK except the php client delivered as a sample, do you have a preference for an out of the box front end easly deployable? My main use case is to be compliant with SEO, or at least to give nice (url) entry point. Thanks. 2012/11/6 Erick Erickson erickerick...@gmail.com Velocity/Solaritas was never intended to be a user-facing app. How are you locking things down so a user can't enter, or instance, q=deletequery*:*/query/deletecommit=true? I'd really recommend a proper middleware layer unless you have a trusted user base... FWIW, Erick On Tue, Nov 6, 2012 at 4:20 AM, Sébastien Dartigues sebastien.dartig...@gmail.com wrote: Hi all, Today i'm using solritas as front-end for the solr search engine. But i would like to do url rewriting to deliver urls more compliant with SEO. First the end user types that kind of url : http://host.com/query/myquery So this url should be rewriten internally (kind of reverse proxy) in http://localhost:8983/query?q=myquery. This internal url should not be displayed to the end user and in return when the result page is displayed all the links in the page should be rewritten with a SEO compliant url. I tried to perform some tests with an apache front end by using mod_proxy but i didn't succeed to pass url parameters. Does someone ever tried to do SEO with solr search engine (solritas front)? Thanks for your help.
Re: SolrCloud failover behavior
I was right for once G.. Thanks for updating the Wiki! Erick On Tue, Nov 6, 2012 at 9:42 AM, Nick Chase nch...@earthlink.net wrote: Thanks a million, Erick! You're right about killing both nodes hosting the shard. I'll get the wiki corrected. Nick On 11/3/2012 10:51 PM, Erick Erickson wrote: SolrCloud doesn't work unless every shard has at least one server that is up and running. I _think_ you might be killing both nodes that host one of the shards. The admin page has a link showing you the state of your cluster. So when this happens, does that page show both nodes for that shard being down? And yeah, SolrCloud requires a quorum of ZK nodes up. So with only one ZK node, killing that will bring down the whole cluster. Which is why the usual recommendation is that ZK be run externally and usually an odd number of ZK nodes (three or more). Anyone can create a login and edit the Wiki, so any clarifications are welcome! Best Erick On Sat, Nov 3, 2012 at 12:17 PM, Nick Chase nch...@earthlink.net wrote: I think there's a change in the behavior of SolrCloud vs. what's in the wiki, but I was hoping someone could confirm for me. I checked JIRA and there were a couple of issues requesting partial results if one server comes down, but that doesn't seem to be the issue here. I also checked CHANGES.txt and don't see anything that seems to apply. I'm running Example B: Simple two shard cluster with shard replicas from the wiki at https://wiki.apache.org/solr/SolrCloudhttps://wiki.apache.org/solr/**SolrCloud https://wiki.**apache.org/solr/SolrCloudhttps://wiki.apache.org/solr/SolrCloudand everything starts out as expected. However, when I get to the part about fail over behavior is when things get a little wonky. I added data to the shard running on 7475. If I kill 7500, a query to any of the other servers works fine. But if I kill 7475, rather than getting zero results on a search to 8983 or 8900, I get a 503 error: response lst name=responseHeader int name=status503/int int name=QTime5/int lst name=params str name=q*:*/str /lst /lst lst name=error str name=msgno servers hosting shard:/str int name=code503/int /lst /response I don't see any errors in the consoles. Also, if I kill 8983, which includes the Zookeeper server, everything dies, rather than just staying in a steady state; the other servers continually show: Nov 03, 2012 11:39:34 AM org.apache.zookeeper.ClientCnxn$SendThread startConnect NFO: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:9983 ov 03, 2012 11:39:35 AM org.apache.zookeeper.ClientCnxn$SendThread run ARNING: Session 0x13ac6cf87890002 for server null, unexpected error, closing socket connection and attempting reconnect ava.net.ConnectException: Connection refused: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source) at org.apache.zookeeper.ClientCnxn$SendThread.run(** ClientCnxn.java:1143) ov 03, 2012 11:39:35 AM org.apache.zookeeper.ClientCnxn$SendThread startConnect over and over again, and a call to any of the servers shows a connection error to 8983. This is the current 4.0.0 release, running on Windows 7. If this is the proper behavior and the wiki needs updating, fine; I just need to know. Otherwise if anybody has any clues as to what I may be missing, I'd be grateful. :) Thanks... --- Nick
Re: load balance with SolrCloud
This is a complex setup, all right. A pluggable sharding strategy is definitely something that is on the roadmap for SolrCloud, but hasn't made it into the code base yet. Keep in mind, though, that all the SolrCloud goodness centers around the idea of a single index that may be sharded. I don't think SolrCloud has had time to really think about handling the situation in which you have a bunch of cores that may or may not be sharded but are running on the same server. I don't know that it _doesn't_ work, mind you, but that scenario doesn't seem like the prime use-case for SolrCloud. That said, I don't know that such a situation is _not_ do-able in SolrCloud. Mostly I haven't explored that kind of functionality yet. Not much help, I know. I suspect that this is one of those cases where _we_ will learn from _you_ if you try to meld SolrCloud with your setup. Sounds like a great Wiki page if you do pursue this! Best Erick On Tue, Nov 6, 2012 at 4:58 PM, Jie Sun jsun5...@yahoo.com wrote: Hi Eric, thanks for your information. I read all the related issues with SOLR-1293 as your just pointed me to. It seems they are not very suitable for our scenario. We do have couple of hundreds cores (you are right each customer will be corresponded to a core) typically on one solr instance. and all of them need to be actively working with indexing and queries. So we are not having like 10s of thousands of cores that only part of them need to be loaded. Our issues are on some servers that host very large customers, it runs out of disk space after some time due to the large among of index data. I have written a restful service that is being deployed with solr on tomcat to identify the large customer (core) indexing requests and consult with a dns service, it then off loads the indexing requests to additional solr servers, and support queries using solr shards on these servers going forward. We also have replicas for each shard, managed by our own software using peer model (I am thinking about using solr replications after 1.4). to me, SolrCould is like sharding+replication+zookeeper. I could be wrong. But if I am right, with very big existing data in our service, and we already have a lot of software in place working pretty well utilizing solr 1.4, I am just trying to figure out if it will worth it to migrate the production system to use SolrCloud. The problem we need to fix is in one area : I need to automate the off-load (sharding) process. Right now we use some monitor system to watch for the growth on each server. When we find a fast growing large core(customer), we will start to manually configure our dns directory and start adding shard(s) to it (basically we create a same core name on a different solr server/instance). my restful service going forward will then direct the queries for the customer onto these sharded cores using solr shards. If SolrCloud can not really help me automate this process, it is not very attractive to me right now. I have read some of the topics, I looked into distributing indexing, distributed update processor ... none of them can help the way I have been looking for. So I guess using solrcloud or not, I will need to write my own kind of 'load balancer' for indexing, unless I am wrong. I did come across Jon's white paper on Loggly, I have designed a model based on what he has done. The solution should be able to automatically creating shards, but it will need rsych index files for a core to different server and use solr merge to merge small core into larger cores, or use core admin to add new core on the fly. is this approach sounds like someone is already familiar with and had out-of-box solution? When I looked into solrcloud, I was expecting some pluggable index distributing policy factory I can customize. The closest thing I found was SOLR-2593 (A new core admin action 'split' for splitting index ) but not exactly what I wanted. Let me know if you can advice me on this more. thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/load-balance-with-SolrCloud-tp4018367p4018609.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Add new shard will be treated as replicas in Solr4.0?
got it. thanks a lot On Tue, Nov 6, 2012 at 8:43 PM, Erick Erickson erickerick...@gmail.comwrote: bq: where can i find all the items on the road map? Well, you really can't G... There's no official roadmap. I happen to know this since I follow the developer's list and I've seen references to this being important to the folks doing SolrCloud development work and it's been a recurring theme on the user's list. It's one of those things that _everybody_ understands would be useful in certain circumstances, but haven't had time to actually implement yet. You can track this at: https://issues.apache.org/jira/browse/SOLR-2592 Best Erick On Mon, Nov 5, 2012 at 7:57 PM, Zeng Lames lezhi.z...@gmail.com wrote: btw, where can i find all the items in the road map? thanks! On Tue, Nov 6, 2012 at 8:55 AM, Zeng Lames lezhi.z...@gmail.com wrote: hi Erick, thanks for your kindly response. hv got the information from the SolrCloud wiki. think we may need to defined the shard numbers when we really rollout it. thanks again On Mon, Nov 5, 2012 at 8:40 PM, Erick Erickson erickerick...@gmail.com wrote: Not at present. What you're interested in is shard splitting which is certainly on the roadmap but not implemented yet. To expand the number of shards you'll have to reconfigure, then re-index. Best Erick On Mon, Nov 5, 2012 at 4:09 AM, Zeng Lames lezhi.z...@gmail.com wrote: Dear All, we have an existing solr collection, 2 shards, numOfShard is 2. and there are already records in the index files. now we start another solr instance with ShardId= shard3, and found that Solr treat it as replicas. check the zookeeper data, found the range of shard doesn't change correspondingly. shard 1 is 0-7fff, while shard 2 is 8000-. is there any way to increase new shard for existing collection? thanks a lot! Lames
Re: How to re-read the config files in Solr, on a commit
Erick We have a requirement where seach admin can add or remove some synonyms and would want these changes to be reflected in search thereafter. yes, we looked at reload command and it seems to be suitable for that purpose. We have a master and slave setup so it should be OK to issue reload command on master. I expect that slaves will pull the latest config files. Is reload operation very costly, in terms of time and cpu? We have a multicore setup and would need to issue reload on multiple cores. Thanks Saroj On Tue, Nov 6, 2012 at 5:02 AM, Erick Erickson erickerick...@gmail.comwrote: Not that I know of. This would be extremely expensive in the usual case. Loading up configs, reconfiguring all the handlers etc. would add a huge amount of overhead to the commit operation, which is heavy enough as it is. What's the use-case here? Changing your configs really often and reading them on commit sounds like a way to make for a very confusing application! But if you really need to re-read all this info on a running system, consider the core admin RELOAD command. Best Erick On Mon, Nov 5, 2012 at 8:43 PM, roz dev rozde...@gmail.com wrote: Hi All I am keen to find out if Solr exposes any event listener or other hooks which can be used to re-read configuration files. I know that we have firstSearcher event but I am not sure if it causes request handlers to reload themselves and read the conf files again. For example, if I change the synonym file and solr gets a commit, will it re-initialize request handlers and re-read the conf files. Or, are there some events which can be listened to? Any inputs are welcome. Thanks Saroj
Re: load balance with SolrCloud
thanks for your feedback Erick. I am also aware of the current limitation of shard number in a collection is fixed. changing the number will need re-config and re-index. Let's say if the limitation gets levitated in near future release, I would then consider setup collection for each customer, which will include varies number of shards and their replicas (depend on the customer size and it should grow dynamically). so this will lead to having multiple collections on one solr server instance... I assume setup n collections on one server is not an issue? or is it? I am skeptical, see example on solr wiki below, it seems it is starting a solr instance with a specific collection and its config: cd example java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar thanks Jie -- View this message in context: http://lucene.472066.n3.nabble.com/load-balance-with-SolrCloud-tp4018367p4018659.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to re-read the config files in Solr, on a commit
Hi, Note about modifying synonyms - you need to reindex, really, if using index-time synonyms. And if you're using search-time synonyms you have multi-word synonym issue described on the Wiki. Otis -- Performance Monitoring - http://sematext.com/spm On Nov 6, 2012 11:02 PM, roz dev rozde...@gmail.com wrote: Erick We have a requirement where seach admin can add or remove some synonyms and would want these changes to be reflected in search thereafter. yes, we looked at reload command and it seems to be suitable for that purpose. We have a master and slave setup so it should be OK to issue reload command on master. I expect that slaves will pull the latest config files. Is reload operation very costly, in terms of time and cpu? We have a multicore setup and would need to issue reload on multiple cores. Thanks Saroj On Tue, Nov 6, 2012 at 5:02 AM, Erick Erickson erickerick...@gmail.com wrote: Not that I know of. This would be extremely expensive in the usual case. Loading up configs, reconfiguring all the handlers etc. would add a huge amount of overhead to the commit operation, which is heavy enough as it is. What's the use-case here? Changing your configs really often and reading them on commit sounds like a way to make for a very confusing application! But if you really need to re-read all this info on a running system, consider the core admin RELOAD command. Best Erick On Mon, Nov 5, 2012 at 8:43 PM, roz dev rozde...@gmail.com wrote: Hi All I am keen to find out if Solr exposes any event listener or other hooks which can be used to re-read configuration files. I know that we have firstSearcher event but I am not sure if it causes request handlers to reload themselves and read the conf files again. For example, if I change the synonym file and solr gets a commit, will it re-initialize request handlers and re-read the conf files. Or, are there some events which can be listened to? Any inputs are welcome. Thanks Saroj
Two questions about solrcloud
Hi all,sorry for questions about solrcloud from newbie. here is my two questions: 1.If I have a solrcloud cluster with two shards and 0 replica on two different server. when one of server restarts will the solr instance on that server replay the transaction log to make sure these operations persistent to the index files(commit the transaction log)? 2.Assuming I have 3 shards cluster with 4 different server, it will form a cluster with 3 shard and 1 replica. Can I remove one server to reduce (degrade)the number of servers? if so does I just need to shutdown the server and manually remove it's node from ZK? Regards SuoNayi
Re: Solr Replication is not Possible on RAMDirectory?
Erik Hatcher-4 wrote There's an open issue (with a patch!) that enables this, it seems: lt;https://issues.apache.org/jira/browse/SOLR-3911gt; Erik well patch seems not doing that... i have tried and still getting some error lines about the dir types - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Replication-is-not-Possible-on-RAMDirectory-tp4017766p4018670.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to re-read the config files in Solr, on a commit
Thanks Otis for pointing this out. We may end up using search time synonyms for single word synonym and use index time synonym for multi world synonyms. -Saroj On Tue, Nov 6, 2012 at 8:09 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Note about modifying synonyms - you need to reindex, really, if using index-time synonyms. And if you're using search-time synonyms you have multi-word synonym issue described on the Wiki. Otis -- Performance Monitoring - http://sematext.com/spm On Nov 6, 2012 11:02 PM, roz dev rozde...@gmail.com wrote: Erick We have a requirement where seach admin can add or remove some synonyms and would want these changes to be reflected in search thereafter. yes, we looked at reload command and it seems to be suitable for that purpose. We have a master and slave setup so it should be OK to issue reload command on master. I expect that slaves will pull the latest config files. Is reload operation very costly, in terms of time and cpu? We have a multicore setup and would need to issue reload on multiple cores. Thanks Saroj On Tue, Nov 6, 2012 at 5:02 AM, Erick Erickson erickerick...@gmail.com wrote: Not that I know of. This would be extremely expensive in the usual case. Loading up configs, reconfiguring all the handlers etc. would add a huge amount of overhead to the commit operation, which is heavy enough as it is. What's the use-case here? Changing your configs really often and reading them on commit sounds like a way to make for a very confusing application! But if you really need to re-read all this info on a running system, consider the core admin RELOAD command. Best Erick On Mon, Nov 5, 2012 at 8:43 PM, roz dev rozde...@gmail.com wrote: Hi All I am keen to find out if Solr exposes any event listener or other hooks which can be used to re-read configuration files. I know that we have firstSearcher event but I am not sure if it causes request handlers to reload themselves and read the conf files again. For example, if I change the synonym file and solr gets a commit, will it re-initialize request handlers and re-read the conf files. Or, are there some events which can be listened to? Any inputs are welcome. Thanks Saroj