Re: how to index 20 MB plain-text xml
Hi! I had the same issue with XML files. Even small XML files produced OOM exception. I read that the way XMLs are parsed can sometimes blow up memory requirements to such values that java runs out of heap. My solution was: 1. Don't parse XML files 2. Parse only small XML files and hope for the best 3. Give Solr the largest possible amount of java heap size (and hope for the best) But then again, one time I also got OOM exception with Word documents - it turned out that some user had pasted 400 MB worth of photos into a Word file. Regards, Primoz From: Floyd Wu To: solr-user@lucene.apache.org Date: 31.03.2014 08:18 Subject:Re: how to index 20 MB plain-text xml Hi Alex, Thanks for your responding. Personally I don't want to feed these big xml to solr. But users wants. I'll try your suggestions later. Many thanks. Floyd 2014-03-31 13:44 GMT+08:00 Alexandre Rafalovitch : > Without digging too deep into why exactly this is happening, here are > the general options: > > 0. Are you actually committing? Check the messages in the logs and see > if the records show up when you expect them too. > 1. Are you actually trying to feed 20Mb file to Solr? Maybe it's HTTP > buffer that's blowing up? Try using stream.file instead (notice > security warning though): http://wiki.apache.org/solr/ContentStream > 2. Split file into smaller ones and and commit each separately > 3. Set hard auto-commit in solrconfig.xml based on number of documents > to flush in-memory structures to disk > 4. Switch to using DataImportHandler to pull from XML instead of pushing > 5. Increase amount of memory to Solr (-X command line flags) > > Regards, >Alex. > > Personal website: http://www.outerthoughts.com/ > Current project: http://www.solr-start.com/ - Accelerating your Solr > proficiency > > On Mon, Mar 31, 2014 at 12:00 PM, Floyd Wu wrote: > > I have many plain text xml that I transfer to form of solr xml format. > > But every time I send them to solr, I hit OOM exception. > > How to configure solr to "eat" these big xml? > > Please guide me a way. Thanks > > > > floyd >
Re: Updating an entry in Solr
Yes, that's correct. You can also update document "per field" but all fields need to be stored=true, because Solr (version >= 4.0) first gets your document from the index, creates new document with modified field, and adds it again to the index... Primoz From: gohome190 To: solr-user@lucene.apache.org Date: 13.11.2013 14:39 Subject:Re: Updating an entry in Solr Okay, so I've found in the solr tutorial that if you do a POST command and post a new entry with the same uniquekey (in my case, id_) as an entry already in the index, solr will automatically replace it for you. That seems to be what I need, right? -- View this message in context: http://lucene.472066.n3.nabble.com/Updating-an-entry-in-Solr-tp4100674p4100675.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding a server to an existing SOLR cloud cluster
According to the wiki pages it should, but I have not really tried it yet - I like to make the "bookeeping" myself :) I am sorry but someones with more knowledge of Solr will have to answer your question. Primoz From: ade-b To: solr-user@lucene.apache.org Date: 11.11.2013 15:44 Subject:Re: Adding a server to an existing SOLR cloud cluster Thanks. If I understand what you are saying, it should automatically register itself with the existing cluster if we start SOLR with the correct command line options. We tried adding the numShards option to the command line but still get the same outcome. We start the new SOLR server using /usr/bin/java -Djava.util.logging.config.file=/mnt/ephemeral/apache-tomcat-7.0.47/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -server -Xms256m -Xmx1024m -XX:+DisableExplicitGC -Dsolr.solr.home=/mnt/ephemeral/solr -Dport=8080 -DhostContext=solr -DnumShards=1 -DzkClientTimeout=15000 -DzkHost= -Djava.endorsed.dirs=/mnt/ephemeral/apache-tomcat-7.0.47/endorsed -classpath /mnt/ephemeral/apache-tomcat-7.0.47/bin/bootstrap.jar:/mnt/ephemeral/apache-tomcat-7.0.47/bin/tomcat-juli.jar -Dcatalina.base=/mnt/ephemeral/apache-tomcat-7.0.47 -Dcatalina.home=/mnt/ephemeral/apache-tomcat-7.0.47 -Djava.io.tmpdir=/mnt/ephemeral/apache-tomcat-7.0.47/temp org.apache.catalina.startup.Bootstrap start Regards Ade -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-a-server-to-an-existing-SOLR-cloud-cluster-tp4100275p4100286.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding a server to an existing SOLR cloud cluster
Try manually creating shard replicas on the new server. I think the new server is only used automatically when you start you Solr server instance with "correct command line" option (aka. -DnumShards) - I never liked this kind of behaviour. The server is not present in clusterstate.json file, because it contains no replicas - but it is a live node, as you have already stated. Best regards, Primoz From: ade-b To: solr-user@lucene.apache.org Date: 11.11.2013 14:48 Subject:Adding a server to an existing SOLR cloud cluster Hi We have a SOLRCloud cluster of 3 solr servers (v4.5.0 running under tomcat) with 1 shard. We added a new SOLR server (v4.5.1) by simply starting tomcat and pointing it at the zookeeper ensemble used by the existing cluster. My understanding was that this new server would handshake with zookeeper and add itself as a replica to the existing cluster. What has actually happened is that the server is in zookeeper's live_nodes, but is not in the clusterstate.json file. It also does not have a CORE/collection associated with it. Any ideas? I assume I am missing a step. Do I have to manually create the core on the new server? Cheers Ade -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-a-server-to-an-existing-SOLR-cloud-cluster-tp4100275.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: A few questions about solr and tika
Everythink about Tika extraction is written under those links. Basicaly what you need is the following: 1) requestHandler for Tika in solrconfig.xml 2) keep all the fields in schema.xml that are needed for Tika (they are marked in example schema.xml) and set those you don't need to indexed=false and stored=false 3) if you want to limit the returned fields in query response use query parameter 'fl'. Primoz From: wonder To: solr-user@lucene.apache.org Date: 17.10.2013 14:44 Subject:Re: A few questions about solr and tika Thanks for answer. If I dont want to store and index any fields i do: Other qestions is still open for me. 17.10.2013 14:26, primoz.sk...@policija.si пишет: > Why don't you check these: > > - Content extraction with Apache Tika ( > http://www.youtube.com/watch?v=ifgFjAeTOws) > - ExtractingRequestHandler ( > http://wiki.apache.org/solr/ExtractingRequestHandler) > - Uploading Data with Solr Cell using Apache Tika ( > https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika > ) > > Primož > > > > From: wonder > To: solr-user@lucene.apache.org > Date: 17.10.2013 12:23 > Subject:A few questions about solr and tika > > > > Hello everyone! Please tell me how and where to set Tika options in > Solr? Where is Tica conf? I'm want to know how I can eliminate not > required to me response attribute(such as links or images)? Also I am > interesting how i can get and index only metadata in several file formats? > >
Re: A few questions about solr and tika
Why don't you check these: - Content extraction with Apache Tika ( http://www.youtube.com/watch?v=ifgFjAeTOws) - ExtractingRequestHandler ( http://wiki.apache.org/solr/ExtractingRequestHandler) - Uploading Data with Solr Cell using Apache Tika ( https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika ) Primož From: wonder To: solr-user@lucene.apache.org Date: 17.10.2013 12:23 Subject:A few questions about solr and tika Hello everyone! Please tell me how and where to set Tika options in Solr? Where is Tica conf? I'm want to know how I can eliminate not required to me response attribute(such as links or images)? Also I am interesting how i can get and index only metadata in several file formats?
Re: SolrCloud Performance Issue
Query result cache hit might be low due to using NOW in bf. NOW is always translated to current time and that of course changes from ms to ms... :) Primoz From: Shamik Bandopadhyay To: solr-user@lucene.apache.org Date: 17.10.2013 00:14 Subject:SolrCloud Performance Issue Hi, I'm in the process of transitioning to SolrCloud from a conventional Master-Slave model. I'm using Solr 4.4 and has set-up 2 shards with 1 replica each. I've 3 zookeeper ensemble. All the nodes are running on AWS EC2 instances. Shards are on m1.xlarge and sharing a zookeeper instance (mounted on a separate volume). 6 gb memory is allocated to each solr instance. I've around 10 million documents in index. With the previous standalone model, the queries avg around 100 ms. The SolrCloud query response have been abysmal so far. The query response time is over 1000ms, reaching 2000ms often. I expected some surge due to additional servers, network latency, etc. but this difference is really baffling. The hardware is similar in both cases, except for the fact that couple of SolrCloud node is sharing zookeeper as well. m1x.large I/O is high, so shouldn't be a bottleneck as well. The other difference from old setup is that I'm using the new CloudSolrServer class which is having the 3 zookeeper reference for load balancing. But I don't think it has any major impact as the queries executed from Solr admin query panel confirms the slowness. Here are some of my configuration setup: 3 false 1000 1024 true 200 400 line xref draw line draw linelanguage:english lineSource2:documentation lineSource2:CloudHelp drawlanguage:english drawSource2:documentation drawSource2:CloudHelp 2 The custom request handler : explicit 0.01 velocity browse text/html;charset=UTF-8 layout cloudhelp edismax *:* 15 id,url,Description,Source2,text,filetype,title,LastUpdateDate,PublishDate,ViewCount,TotalMessageCount,Solution,LastPostAuthor,Author,Duration,AuthorUrl,ThumbnailUrl,TopicId,score text^1.5 title^2 IndexTerm^.9 keywords^1.2 ADSKCommandSrch^2 ADSKContextId^1 Source2:CloudHelp^3 Source2:youtube^0.85 recip(ms(NOW,PublishDate),3.16e-11,1,1)^2.0 text on 1 100 language Source2 DocumentationBook ADSKProductDisplay audience true text title 250 ShortDesc true default true false false 1 spellcheck One thing I've noticed is that the queryresultcache hit rate is really low, not sure our queries are always that unique. I'm using edismax and there's a recip(ms(NOW,PublishDate),3.16e-11,1,1)^2.0 , can this contribute ? Sorry about the long post, but I'm struggling to nail down the issue here, especially when queries are running fine in a master-slave environment with similar hardware and network. Any pointers will be highly appreciated. Regards, Shamik
Re: howto increase indexing speed?
I think DIH uses only one core per instance. IMHO 300 doc/sec is quite good. If you would like to use more cores you need to use solrj. Or maybe more than one DIH and more cores of course. Primoz From: Giovanni Bricconi To: solr-user Date: 16.10.2013 16:25 Subject:howto increase indexing speed? I have a small solr setup, not even on a physical machine but a vmware virtual machine with a single cpu that reads data using DIH from a database. The machine has no phisical disks attached but stores data on a netapp nas. Currently this machine indexes 320 documents/sec, not bad but we plan to double the index and we would like to keep nearly the same. Doing some basic checks during the indexing I have found with iostat that the usage of the disks is nearly 8% and the source database is running fine, instead the virtual cpu is 95% running on solr. Now I can quite easily add another virtual cpu to the solr box, but as far as I know this won't help because DIH doesn't work in parallel. Am I wrong? What would you do? Rewrite the feeding process quitting dih and using solrj to feed data in parallel? Would you instead keep DIH and switch to a sharded configuration? Thank you for any hints Giovanni
Re: Error when i want to create a CORE
Can you try with a directory path that contains *no* spaces. Primoz From: raige To: solr-user@lucene.apache.org Date: 16.10.2013 14:46 Subject:Error when i want to create a CORE I install the version solr 4.5 on windows. I launch with Jetty web server the example. I have no problem with collection 1 core. But, when i want to create my core, the server send me this error : * org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not load config file C:\Documents and Settings\r.lucas\Bureau\Moteur\solr-4.5.0\example\solr\index1\solrconfig.xml* could you help please -- View this message in context: http://lucene.472066.n3.nabble.com/Error-when-i-want-to-create-a-CORE-tp4095894.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regarding Solr Cloud issue...
Hm, good question. I haven't really done any upgrading yet, because I just reinstall and reindex everything. I would replace jars with the new ones (if needed - check release notes for version 4.4.0 and 4.5.0 where all the versions of external tools [tika, maven, etc.] are stated) and deploy the updated WAR file to servlet container. Primoz From: Chris To: solr-user Date: 16.10.2013 14:30 Subject:Re: Regarding Solr Cloud issue... oh great. Thanks Primoz. is there any simple way to do the upgrade to 4.5 without having to change my configurations? update a few jar files etc? On Wed, Oct 16, 2013 at 4:58 PM, wrote: > >>> Also, another issue that needs to be raised is the creation of cores > from > >>> the "core admin" section of the gui, doesnt really work well, it > creates > >>> files but then they do not work (again i am using 4.4) > > From my experience "core admin" section of the GUI does not work well in > SolrCloud domain. If I am not mistaken this was somehow fixed in 4.5.0 > which acts much better. > > I would use only HTTP requests ("cores and collections API") with > SolrCloud and would use GUI only for viewing the state of cluster and > cores. > > Primoz > > >
Re: Regarding Solr Cloud issue...
>>> Also, another issue that needs to be raised is the creation of cores from >>> the "core admin" section of the gui, doesnt really work well, it creates >>> files but then they do not work (again i am using 4.4) >From my experience "core admin" section of the GUI does not work well in SolrCloud domain. If I am not mistaken this was somehow fixed in 4.5.0 which acts much better. I would use only HTTP requests ("cores and collections API") with SolrCloud and would use GUI only for viewing the state of cluster and cores. Primoz
Re: Regarding Solr Cloud issue...
Yap, you are right - I only created extra replicas with cores API. For a new shard I had to use "split shard" command. My apologies. Primož From: Shalin Shekhar Mangar To: solr-user@lucene.apache.org Date: 16.10.2013 10:45 Subject:Re: Regarding Solr Cloud issue... If the initial collection was created with a numShards parameter (and hence compositeId router then there was no way to create a new logical shard. You can add replicas with the core admin API but only to shards that already exist. A new logical shard can only be created by splitting an existing one. The "createshard" API also has the same limitation -- it cannot create a shard for a collection with compositeId router. It is supposed to be used for collections with custom sharding (i.e. "implicit" router). In such collections, there is no concept of a hash range and routing is done explicitly by the user using the "shards" parameter in the request or by sending the request to the target core/node directly. So, in summary, attempting to add a new logical shard to a collection with compositeId router via CoreAdmin APIs is wrong, unsupported and should be disallowed. Adding replicas to existing logical shards is okay though. On Wed, Oct 16, 2013 at 12:56 PM, wrote: > If I am not mistaken the only way to create a new shard from a collection > in 4.4.0 was to use cores API. That worked fine for me until I used > *other* cores API commands. Those usually produced null ranges. > > In 4.5.0 this is fixed with newly added commands "createshard" etc. to the > collections API, right? > > Primoz > > > > From: Shalin Shekhar Mangar > To: solr-user@lucene.apache.org > Date: 16.10.2013 09:06 > Subject:Re: Regarding Solr Cloud issue... > > > > Chris, can you post your complete clusterstate.json? Do all shards have a > null range? Also, did you issue any core admin CREATE commands apart from > the create collection api. > > Primoz, I was able to reproduce this but by doing an illegal operation. > Suppose I create a collection with numShards=5 and then I issue a core > admin create command such as: > > http://localhost:8983/solr/admin/cores?action=CREATE&name=xyz&collection=mycollection51&shard=shard6 > > > Then a "shard6" is added to the collection with a null range. This is a > bug > because we should never allow such a core admin create to succeed anyway. > I'll open an issue. > > > > On Wed, Oct 16, 2013 at 11:49 AM, wrote: > > > I sometimes also do get null ranges when doing colletions/cores API > > actions CREATE or/and UNLOAD, etc... In 4.4.0 that was not easily fixed > > because zkCli had problems with "putfile" command, but in 4.5.0 it works > > OK. All you have to do is "download" clusterstate.json from ZK ("get > > /clusterstate.json"), fix ranges to appropriate values and upload the > file > > back to ZK with zkCli. > > > > But why those null ranges happen at all is beyond me :) > > > > Primoz > > > > > > > > From: Shalin Shekhar Mangar > > To: solr-user@lucene.apache.org > > Date: 16.10.2013 07:37 > > Subject:Re: Regarding Solr Cloud issue... > > > > > > > > I'm sorry I am not able to reproduce this issue. > > > > I started 5 solr-4.4 instances. > > I copied example directory into example1, example2, example3 and > example4 > > cd example; java -Dbootstrap_confdir=./solr/collection1/conf > > -Dcollection.configName=myconf -DzkRun -DnumShards=1 -jar start.jar > > cd example1; java -Djetty.port=7574 -DzkHost=localhost:9983 -jar > start.jar > > cd example2; java -Djetty.port=7575 -DzkHost=localhost:9983 -jar > start.jar > > cd example3; java -Djetty.port=7576 -DzkHost=localhost:9983 -jar > start.jar > > cd example4; java -Djetty.port=7577 -DzkHost=localhost:9983 -jar > start.jar > > > > After that I invoked: > > > > > > http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection51&numShards=5&replicationFactor=1 > > > > > > > I can see all shards having non-null ranges in clusterstate. > > > > > > On Tue, Oct 15, 2013 at 8:47 PM, Chris wrote: > > > > > Hi Shalin,. > > > > > > Thank you for your quick reply. I appreciate all the help. > > > > > > I started the solr cloud servers first...with 5 nodes. > > > > > > then i issued a command like below to create the shards - > > > > > > > > > > > > > > > http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=5&replicationFactor=1 > > > > > > < > > > > > > > > > http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&replicationFactor=4 > > > > > > > > > > > > > Please advice. > > > > > > Regards, > > > Chris > > > > > > > > > On Tue, Oct 15, 2013 at 8:07 PM, Shalin Shekhar Mangar < > > > shalinman...@gmail.com> wrote: > > > > > > > How did you create these shards? Can you tell us how to reproduce > the > > > > issue? > > > > > > > > Any shard in a collection with compositeId router should never have > > null > > > > ranges. > > > > > > > > > > >
Re: Regarding Solr Cloud issue...
If I am not mistaken the only way to create a new shard from a collection in 4.4.0 was to use cores API. That worked fine for me until I used *other* cores API commands. Those usually produced null ranges. In 4.5.0 this is fixed with newly added commands "createshard" etc. to the collections API, right? Primoz From: Shalin Shekhar Mangar To: solr-user@lucene.apache.org Date: 16.10.2013 09:06 Subject:Re: Regarding Solr Cloud issue... Chris, can you post your complete clusterstate.json? Do all shards have a null range? Also, did you issue any core admin CREATE commands apart from the create collection api. Primoz, I was able to reproduce this but by doing an illegal operation. Suppose I create a collection with numShards=5 and then I issue a core admin create command such as: http://localhost:8983/solr/admin/cores?action=CREATE&name=xyz&collection=mycollection51&shard=shard6 Then a "shard6" is added to the collection with a null range. This is a bug because we should never allow such a core admin create to succeed anyway. I'll open an issue. On Wed, Oct 16, 2013 at 11:49 AM, wrote: > I sometimes also do get null ranges when doing colletions/cores API > actions CREATE or/and UNLOAD, etc... In 4.4.0 that was not easily fixed > because zkCli had problems with "putfile" command, but in 4.5.0 it works > OK. All you have to do is "download" clusterstate.json from ZK ("get > /clusterstate.json"), fix ranges to appropriate values and upload the file > back to ZK with zkCli. > > But why those null ranges happen at all is beyond me :) > > Primoz > > > > From: Shalin Shekhar Mangar > To: solr-user@lucene.apache.org > Date: 16.10.2013 07:37 > Subject:Re: Regarding Solr Cloud issue... > > > > I'm sorry I am not able to reproduce this issue. > > I started 5 solr-4.4 instances. > I copied example directory into example1, example2, example3 and example4 > cd example; java -Dbootstrap_confdir=./solr/collection1/conf > -Dcollection.configName=myconf -DzkRun -DnumShards=1 -jar start.jar > cd example1; java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar > cd example2; java -Djetty.port=7575 -DzkHost=localhost:9983 -jar start.jar > cd example3; java -Djetty.port=7576 -DzkHost=localhost:9983 -jar start.jar > cd example4; java -Djetty.port=7577 -DzkHost=localhost:9983 -jar start.jar > > After that I invoked: > > http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection51&numShards=5&replicationFactor=1 > > > I can see all shards having non-null ranges in clusterstate. > > > On Tue, Oct 15, 2013 at 8:47 PM, Chris wrote: > > > Hi Shalin,. > > > > Thank you for your quick reply. I appreciate all the help. > > > > I started the solr cloud servers first...with 5 nodes. > > > > then i issued a command like below to create the shards - > > > > > > > > http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=5&replicationFactor=1 > > > < > > > > http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&replicationFactor=4 > > > > > > > > Please advice. > > > > Regards, > > Chris > > > > > > On Tue, Oct 15, 2013 at 8:07 PM, Shalin Shekhar Mangar < > > shalinman...@gmail.com> wrote: > > > > > How did you create these shards? Can you tell us how to reproduce the > > > issue? > > > > > > Any shard in a collection with compositeId router should never have > null > > > ranges. > > > > > > > > > On Tue, Oct 15, 2013 at 7:07 PM, Chris wrote: > > > > > > > Hi, > > > > > > > > I am using solr 4.4 as cloud. while creating shards i see that the > last > > > > shard has range of "null". i am not sure if this is a bug. > > > > > > > > I am stuck with having null value for the range in clusterstate.json > > > > (attached below) > > > > > > > > "shard5":{ "range":null, "state":"active", > "replicas":{"core_node1":{ > > > > "state":"active", "core":"Web_shard5_replica1", > > > > "node_name":"domain-name.com:1981_solr", "base_url":" > > > > http://domain-name.com:1981/solr";, "leader":"true", > > > > "router":"compositeId"}, > > > > > > > > I tried to use zookeeper cli to change this, but it was not able to. > I > > > > tried to locate this file, but didn't find it anywhere. > > > > > > > > Can you please let me know how do i change the range from null to > > > something > > > > meaningful? i have the range that i need, so if i can find the file, > > > maybe > > > > i can change it manually. > > > > > > > > My next question is - can we have a catch all for ranges, i mean if > > > things > > > > don't match any other range then insert in this shard..is this > > possible? > > > > > > > > Kindly advice. > > > > Chris > > > > > > > > > > > > > > > > -- > > > Regards, > > > Shalin Shekhar Mangar. > > > > > > > > > -- > Regards, > Shalin Shekhar Mangar. > > -- Regards, Shalin Shekhar Mangar.
Re: Cores with lot of folders with prefix index.XXXXXXX
I will certainly try, but give me some time :) Primoz From: Shalin Shekhar Mangar To: solr-user@lucene.apache.org Date: 16.10.2013 07:05 Subject:Re: Cores with lot of folders with prefix index.XXX I think that's an acceptable strategy. Can you put up a patch? On Tue, Oct 15, 2013 at 2:32 PM, wrote: > I have a question for developers of Solr regarding the issue of > "left-over" index folders when replication fails. Could be this issue > resolved quickly if when replication starts Solr creates a "flag file" in > "index." folder and when replication ends (and commits) this file is > deleted? In this case if a server is restarted (or on schedule) it could > quickly scan all the "index." folders and delete those (maybe not the > last one or those relevant to the index.properties file) that still > *contain* a flag file and are so unfinished and uncommited. > > I have not really looked at the code yet so I may have a different view on > the workings of replication. Would the solution I described at least > address this issue? > > Best regards, > > Primoz > > > > > > From: primoz.sk...@policija.si > To: solr-user@lucene.apache.org > Date: 11.10.2013 12:46 > Subject:Re: Cores with lot of folders with prefix index.XXX > > > > Thanks, I guess I was wrong after all in my last post. > > Primož > > > > > From: Shalin Shekhar Mangar > To: solr-user@lucene.apache.org > Date: 11.10.2013 12:43 > Subject:Re: Cores with lot of folders with prefix index.XXX > > > > There are open issues related to extra index.XXX folders lying around if > replication/recovery fails. See > https://issues.apache.org/jira/browse/SOLR-4506 > > > On Fri, Oct 11, 2013 at 4:06 PM, Yago Riveiro > wrote: > > > The thread that you point is about master / slave - replication, Is this > > issue valid on SolrCloud context? > > > > I check the index.properties and indeed the variable index=index.X > > point to a folder, the others can be deleted without any scary side > effect? > > > > > > -- > > Yago Riveiro > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > > > > > > On Friday, October 11, 2013 at 11:31 AM, primoz.sk...@policija.si wrote: > > > > > Do you have a lot of failed replications? Maybe those folders have > > > something to do with this (please see the last answer at > > > > > > > http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing > > > > > ). If your disk space is valuable check index.properties file under > data > > > folder and try to determine which folders can be safely deleted. > > > > > > Primo¾ > > > > > > > > > > > > > > > From: Yago Riveiro > yago.rive...@gmail.com)> > > > To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org) > > > Date: 11.10.2013 12:13 > > > Subject: Re: Cores with lot of folders with prefix index.XXX > > > > > > > > > > > > I have ssd's therefor my space is like gold, I can have 30% of my > space > > > waste in failed replications, or replications that are not cleaned. > > > > > > The question for me is if this a normal behaviour or is a bug. If is a > > > normal behaviour I have a trouble because a ssd with more than 512G is > > > expensive. > > > > > > -- > > > Yago Riveiro > > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > > > > > > > > > On Friday, October 11, 2013 at 11:03 AM, > primoz.sk...@policija.si(mailto: > > primoz.sk...@policija.si) wrote: > > > > > > > I think this is connected to replications being made? I also have > quite > > > > some of them but currently I am not worried :) > > > > > > > > > > > > > > > > > > > > > > -- > Regards, > Shalin Shekhar Mangar. > > > -- Regards, Shalin Shekhar Mangar.
Re: Regarding Solr Cloud issue...
I sometimes also do get null ranges when doing colletions/cores API actions CREATE or/and UNLOAD, etc... In 4.4.0 that was not easily fixed because zkCli had problems with "putfile" command, but in 4.5.0 it works OK. All you have to do is "download" clusterstate.json from ZK ("get /clusterstate.json"), fix ranges to appropriate values and upload the file back to ZK with zkCli. But why those null ranges happen at all is beyond me :) Primoz From: Shalin Shekhar Mangar To: solr-user@lucene.apache.org Date: 16.10.2013 07:37 Subject:Re: Regarding Solr Cloud issue... I'm sorry I am not able to reproduce this issue. I started 5 solr-4.4 instances. I copied example directory into example1, example2, example3 and example4 cd example; java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -DnumShards=1 -jar start.jar cd example1; java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar cd example2; java -Djetty.port=7575 -DzkHost=localhost:9983 -jar start.jar cd example3; java -Djetty.port=7576 -DzkHost=localhost:9983 -jar start.jar cd example4; java -Djetty.port=7577 -DzkHost=localhost:9983 -jar start.jar After that I invoked: http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection51&numShards=5&replicationFactor=1 I can see all shards having non-null ranges in clusterstate. On Tue, Oct 15, 2013 at 8:47 PM, Chris wrote: > Hi Shalin,. > > Thank you for your quick reply. I appreciate all the help. > > I started the solr cloud servers first...with 5 nodes. > > then i issued a command like below to create the shards - > > > http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=5&replicationFactor=1 > < > http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&numShards=3&replicationFactor=4 > > > > Please advice. > > Regards, > Chris > > > On Tue, Oct 15, 2013 at 8:07 PM, Shalin Shekhar Mangar < > shalinman...@gmail.com> wrote: > > > How did you create these shards? Can you tell us how to reproduce the > > issue? > > > > Any shard in a collection with compositeId router should never have null > > ranges. > > > > > > On Tue, Oct 15, 2013 at 7:07 PM, Chris wrote: > > > > > Hi, > > > > > > I am using solr 4.4 as cloud. while creating shards i see that the last > > > shard has range of "null". i am not sure if this is a bug. > > > > > > I am stuck with having null value for the range in clusterstate.json > > > (attached below) > > > > > > "shard5":{ "range":null, "state":"active", "replicas":{"core_node1":{ > > > "state":"active", "core":"Web_shard5_replica1", > > > "node_name":"domain-name.com:1981_solr", "base_url":" > > > http://domain-name.com:1981/solr";, "leader":"true", > > > "router":"compositeId"}, > > > > > > I tried to use zookeeper cli to change this, but it was not able to. I > > > tried to locate this file, but didn't find it anywhere. > > > > > > Can you please let me know how do i change the range from null to > > something > > > meaningful? i have the range that i need, so if i can find the file, > > maybe > > > i can change it manually. > > > > > > My next question is - can we have a catch all for ranges, i mean if > > things > > > don't match any other range then insert in this shard..is this > possible? > > > > > > Kindly advice. > > > Chris > > > > > > > > > > > -- > > Regards, > > Shalin Shekhar Mangar. > > > -- Regards, Shalin Shekhar Mangar.
Re: Cores with lot of folders with prefix index.XXXXXXX
I have a question for developers of Solr regarding the issue of "left-over" index folders when replication fails. Could be this issue resolved quickly if when replication starts Solr creates a "flag file" in "index." folder and when replication ends (and commits) this file is deleted? In this case if a server is restarted (or on schedule) it could quickly scan all the "index." folders and delete those (maybe not the last one or those relevant to the index.properties file) that still *contain* a flag file and are so unfinished and uncommited. I have not really looked at the code yet so I may have a different view on the workings of replication. Would the solution I described at least address this issue? Best regards, Primoz From: primoz.sk...@policija.si To: solr-user@lucene.apache.org Date: 11.10.2013 12:46 Subject:Re: Cores with lot of folders with prefix index.XXX Thanks, I guess I was wrong after all in my last post. Primož From: Shalin Shekhar Mangar To: solr-user@lucene.apache.org Date: 11.10.2013 12:43 Subject:Re: Cores with lot of folders with prefix index.XXX There are open issues related to extra index.XXX folders lying around if replication/recovery fails. See https://issues.apache.org/jira/browse/SOLR-4506 On Fri, Oct 11, 2013 at 4:06 PM, Yago Riveiro wrote: > The thread that you point is about master / slave - replication, Is this > issue valid on SolrCloud context? > > I check the index.properties and indeed the variable index=index.X > point to a folder, the others can be deleted without any scary side effect? > > > -- > Yago Riveiro > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > > > On Friday, October 11, 2013 at 11:31 AM, primoz.sk...@policija.si wrote: > > > Do you have a lot of failed replications? Maybe those folders have > > something to do with this (please see the last answer at > > > http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing > > ). If your disk space is valuable check index.properties file under data > > folder and try to determine which folders can be safely deleted. > > > > Primo¾ > > > > > > > > > > From: Yago Riveiro yago.rive...@gmail.com)> > > To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org) > > Date: 11.10.2013 12:13 > > Subject: Re: Cores with lot of folders with prefix index.XXX > > > > > > > > I have ssd's therefor my space is like gold, I can have 30% of my space > > waste in failed replications, or replications that are not cleaned. > > > > The question for me is if this a normal behaviour or is a bug. If is a > > normal behaviour I have a trouble because a ssd with more than 512G is > > expensive. > > > > -- > > Yago Riveiro > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > > > > > > On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si(mailto: > primoz.sk...@policija.si) wrote: > > > > > I think this is connected to replications being made? I also have quite > > > some of them but currently I am not worried :) > > > > > > > > > > > > -- Regards, Shalin Shekhar Mangar.
Re: Cores with lot of folders with prefix index.XXXXXXX
Thanks, I guess I was wrong after all in my last post. Primož From: Shalin Shekhar Mangar To: solr-user@lucene.apache.org Date: 11.10.2013 12:43 Subject:Re: Cores with lot of folders with prefix index.XXX There are open issues related to extra index.XXX folders lying around if replication/recovery fails. See https://issues.apache.org/jira/browse/SOLR-4506 On Fri, Oct 11, 2013 at 4:06 PM, Yago Riveiro wrote: > The thread that you point is about master / slave - replication, Is this > issue valid on SolrCloud context? > > I check the index.properties and indeed the variable index=index.X > point to a folder, the others can be deleted without any scary side effect? > > > -- > Yago Riveiro > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > > > On Friday, October 11, 2013 at 11:31 AM, primoz.sk...@policija.si wrote: > > > Do you have a lot of failed replications? Maybe those folders have > > something to do with this (please see the last answer at > > > http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing > > ). If your disk space is valuable check index.properties file under data > > folder and try to determine which folders can be safely deleted. > > > > Primo¾ > > > > > > > > > > From: Yago Riveiro yago.rive...@gmail.com)> > > To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org) > > Date: 11.10.2013 12:13 > > Subject: Re: Cores with lot of folders with prefix index.XXX > > > > > > > > I have ssd's therefor my space is like gold, I can have 30% of my space > > waste in failed replications, or replications that are not cleaned. > > > > The question for me is if this a normal behaviour or is a bug. If is a > > normal behaviour I have a trouble because a ssd with more than 512G is > > expensive. > > > > -- > > Yago Riveiro > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > > > > > > On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si(mailto: > primoz.sk...@policija.si) wrote: > > > > > I think this is connected to replications being made? I also have quite > > > some of them but currently I am not worried :) > > > > > > > > > > > > -- Regards, Shalin Shekhar Mangar.
Re: Cores with lot of folders with prefix index.XXXXXXX
Honestly I don't know for sure if you can delete then. Maybe make a backup then delete them and see if it still works :) Replication works differently in SolrCloud world as I currently know. I don't think there are any additional index.* folders because fallback does not work in SolrCloud (someone correct me if I am wrong!). Primož From: Yago Riveiro To: solr-user@lucene.apache.org Date: 11.10.2013 12:36 Subject:Re: Cores with lot of folders with prefix index.XXX The thread that you point is about master / slave - replication, Is this issue valid on SolrCloud context? I check the index.properties and indeed the variable index=index.X point to a folder, the others can be deleted without any scary side effect? -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, October 11, 2013 at 11:31 AM, primoz.sk...@policija.si wrote: > Do you have a lot of failed replications? Maybe those folders have > something to do with this (please see the last answer at > http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing > ). If your disk space is valuable check index.properties file under data > folder and try to determine which folders can be safely deleted. > > Primo¾ > > > > > From: Yago Riveiro mailto:yago.rive...@gmail.com)> > To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org) > Date: 11.10.2013 12:13 > Subject: Re: Cores with lot of folders with prefix index.XXX > > > > I have ssd's therefor my space is like gold, I can have 30% of my space > waste in failed replications, or replications that are not cleaned. > > The question for me is if this a normal behaviour or is a bug. If is a > normal behaviour I have a trouble because a ssd with more than 512G is > expensive. > > -- > Yago Riveiro > Sent with Sparrow (http://www.sparrowmailapp.com/?sig) > > > On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si ( mailto:primoz.sk...@policija.si) wrote: > > > I think this is connected to replications being made? I also have quite > > some of them but currently I am not worried :) > > > > >
Re: Cores with lot of folders with prefix index.XXXXXXX
Do you have a lot of failed replications? Maybe those folders have something to do with this (please see the last answer at http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing ). If your disk space is valuable check index.properties file under data folder and try to determine which folders can be safely deleted. Primož From: Yago Riveiro To: solr-user@lucene.apache.org Date: 11.10.2013 12:13 Subject:Re: Cores with lot of folders with prefix index.XXX I have ssd's therefor my space is like gold, I can have 30% of my space waste in failed replications, or replications that are not cleaned. The question for me is if this a normal behaviour or is a bug. If is a normal behaviour I have a trouble because a ssd with more than 512G is expensive. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si wrote: > I think this is connected to replications being made? I also have quite > some of them but currently I am not worried :)
Re: Cores with lot of folders with prefix index.XXXXXXX
I think this is connected to replications being made? I also have quite some of them but currently I am not worried :) Primož From: yriveiro To: solr-user@lucene.apache.org Date: 11.10.2013 11:54 Subject:Cores with lot of folders with prefix index.XXX Hi, I have some cores with lot of folder with format index.X, my question is why? The collateral effect of this are shards with 50% of size than replicas in other nodes. There is any way to delete this folders to free space? It's a bug? /Yago - Best regards -- View this message in context: http://lucene.472066.n3.nabble.com/Cores-with-lot-of-folders-with-prefix-index-XXX-tp4094920.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Cloud Basic Authentification
If you want to deploy basic authentication in a way that a login is required when creating collections it is only a simple matter of constrainting a url pattern (eg. /solr/admin/collections/*). Maybe this link will help: http://stackoverflow.com/questions/5323855/jetty-webserver-security/5332049#5332049 But keep in mind that intra-node requests in SolrCloud must also be authenticated (because http stack is used). If I understand correctly this is currently not possible. Primož From: maephisto To: solr-user@lucene.apache.org Date: 11.10.2013 11:25 Subject:Re: Solr Cloud Basic Authentification Thank you, But I'm afraid that wiki page does not cover my topic of interest -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903p4094915.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Cloud Basic Authentification
One possible solution is to "firewall" access to SolrCloud server(s). Only proxy/load-balacing servers should have unrestricted access to Solr infrastructure. Then you can implement basic/advanced authentication on the proxy/LB side. Primož From: maephisto To: solr-user@lucene.apache.org Date: 11.10.2013 11:17 Subject:Re: Solr Cloud Basic Authentification Thank you! I'm more interested in the SolrCloud architecture, with shards, shards replicas and distributed index and search. This are the features i use and would like to protect by some basic authentification. I imagine that there must be a way to have this, otherwise anybody could mess with or even drop my collection. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903p4094911.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Cloud Basic Authentification
For pre 4.x Solr (aka Solr 3.x) basic authentication works fine. Check this site: http://wiki.apache.org/solr/SolrSecurity Even "master-slave replication architecture" (*not* SolrCloud) works for me. There could be some problems with *cross-shard* queries etc. though (see SOLR-1861, SOLR-3421). I know I haven't answered your question but hopefully I have given you some more information on the subject. Best regards, Primož From: maephisto To: solr-user@lucene.apache.org Date: 11.10.2013 10:55 Subject:Solr Cloud Basic Authentification I've deployed a SolrCloud cluster in Jetty 9 using solr 4.4.0 and I would like to add some basic authentification. My question is how can I provide the credentials so that they're used in the collection API when creating a new collection or by ZK? Are there any useful docs/wiki on this topic? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Collection API wrong configuration
Works fine at my end. I use Solr 4.5.0 on Windows 7. I tried: >zkcli.bat -cmd upconfig -zkhost localhost:9000 -d ..\solr\collection2\conf -n my_custom_collection >java -Djetty.port=8001 -DzkHost=localhost:9000 -jar start.jar and finally http://localhost:8001/solr/admin/collections?action=CREATE&name=my_custom_collection&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=my_custom_collection If I open newly created core/shard I can see under "Schema" the modified schema file. Best regards, Primož From: maephisto To: solr-user@lucene.apache.org Date: 09.10.2013 11:57 Subject:Collection API wrong configuration I'm experimenting with SolrCloud using Solr 4.5.0 and the Collection API What i did was: 1. upload configuration to ZK zkcli.sh -cmd upconfig -zkhost 127.0.0.1:8993 -d solr/my_custom_collection/conf/ -n my_custom_collection 2. create a collection using the api: /admin/collections?action=CREATE&name=my_custom_collection&numShards=2&replicationFactor=2&maxShardsPerNode=2&collection.configName=my_custom_config The outcome of these action seem to be that the collection cores don't use the my_custom_collection but the example configuration. Any idea why this is happening? -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-API-wrong-configuration-tp4094319.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Hardware dimension for new SolrCloud cluster
I think Mr. Erickson summarized the issue of hardware sizing quite well in the following article: http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Best regards, Primož From: Henrik Ossipoff Hansen To: "solr-user@lucene.apache.org" Date: 08.10.2013 14:59 Subject:Hardware dimension for new SolrCloud cluster We're in the process of moving onto SolrCloud, and have gotten to the point where we are considering how to do our hardware setup. We're limited to VMs running on our server cluster and storage system, so buying new physical servers is out of the question - the question is how we should dimension the new VMs. Our document area is somewhat small, with about 1.2 million orders (rising of course), 75k products (divided into 5 countries - each which will be their own collection/core) and some million customers. In our current master/slave setup, we only index the products, with each country taking up about 35 MB of disk space. The index frequency i more or less updating the indexes 8 times per hour (mostly this is not all data thought, but atomic updates with new stock data, new prices etc.). Our upcoming order and customer indexes however will more or less receive updates "on the fly" as it happens (softcommit) and we expect the same to be the case for products in the near future. - For hardware, it's down to 1 or 2 cores - current master runs with 2 cores - RAM - currently our master runs with 6 GB only - How much heap space should we allocate for max heap? We currently plan on this setup: - 1 machine for a simple loadbalancer - 4 VMs totally for the Solr machines themselves (for both leaders and replicas, just one replica per shard is enough for our use case) - A qorum of 3 ZKs Question is - is this machine setup enough? And how exactly do we dimension the Solr machines? Any help, pointers or resources will be much appreciated :) Thank you!