Re: Corrupt SSTABLE over and over
Hi Alaa, Sounds like you have problems that go beyond Cassandra- likely filesystem corruption or bad disks. I don't know enough about Windows to give you any specific advice but I'd try a run of chkdsk to start. --Bryan On Fri, Aug 12, 2016 at 5:19 PM, Alaa Zubaidi (PDF)wrote: > Hi Bryan, > > Changing disk_failure_policy to best_effort, and running nodetool scrub, > did not work, it generated another error: > java.nio.file.AccessDeniedException > > Also tried to remove all files (data, commitlog, savedcaches) and restart > the node fresh, and still I am getting corruption. > > and Still nothing that indicate there is a HW issue? > All other nodes are fine > > Regards, > Alaa > > > On Fri, Aug 12, 2016 at 12:00 PM, Bryan Cheng > wrote: > >> Should also add that if the scope of corruption is _very_ large, and you >> have a good, aggressive repair policy (read: you are confident in the >> consistency of the data elsewhere in the cluster), you may just want to >> decommission and rebuild that node. >> >> On Fri, Aug 12, 2016 at 11:55 AM, Bryan Cheng >> wrote: >> >>> Looks like you're doing the offline scrub- have you tried online? >>> >>> Here's my typical process for corrupt SSTables. >>> >>> With disk_failure_policy set to stop, examine the failing sstables. If >>> they are very small (in the range of kbs), it is unlikely that there is any >>> salvageable data there. Just delete them, start the machine, and schedule a >>> repair ASAP. >>> >>> If they are large, then it may be worth salvaging. If the scope of >>> corruption is reasonable (limited to a few sstables scattered among >>> different keyspaces), set disk_failure_policy to best_effort, start the >>> machine up, and run the nodetool scrub. This is online scrub, faster than >>> offline scrub (at least of 2.1.12, the last time I had to do this). >>> >>> Only if all else fails, attempt the very painful offline sstablescrub. >>> >>> Is the VMWare client Windows? (Trying to make sure its not just the >>> host). YMMV but in the past Windows was somewhat of a neglected platform >>> wrt Cassandra. I think you'd have a lot easier time getting help if running >>> Linux is an option here. >>> >>> >>> >>> On Fri, Aug 12, 2016 at 9:16 AM, Alaa Zubaidi (PDF) < >>> alaa.zuba...@pdf.com> wrote: >>> Hi Jason, Thanks for your input... Thats what I am afraid of? Did you find any HW error in the VMware and HW logs? any indication that the HW is the reason? I need to make sure that this is the reason before asking the customer to spend more money? Thanks, Alaa On Thu, Aug 11, 2016 at 11:02 PM, Jason Wee wrote: > cassandra run on virtual server (vmware)? > > > I tried sstablescrub but it crashed with hs-err-pid-... > maybe try with larger heap allocated to sstablescrub > > this sstable corrupt i ran into it as well (on cassandra 1.2), first i > try nodetool scrub, still persist, then offline sstablescrub still > persist, wipe the node and it happen again, then i change the hardware > (disk and mem). things went good. > > hth > > jason > > > On Fri, Aug 12, 2016 at 9:20 AM, Alaa Zubaidi (PDF) > wrote: > > Hi, > > > > I have a 16 Node cluster, Cassandra 2.2.1 on Windows, local > installation > > (NOT on the cloud) > > > > and I am getting > > Error [CompactionExecutor:2] 2016-08-12 06:51:52, 983 Cassandra > > Daemon.java:183 - Execption in thread Thread[CompactionExecutor:2,1m > ain] > > org.apache.cassandra.io.FSReaderError: > > org.apache.cassandra.io.sstable.CorruptSSTableExecption: > > org.apache.cassandra.io.compress.CurrptBlockException: > > (E:\\la-4886-big-Data.db): corruption detected, chunk at > 4969092 of > > length 10208. > > at > > org.apache.cassandra.io.util.RandomAccessReader.readBytes(Ra > ndomAccessReader.java:357) > > ~[apache-cassandra-2.2.1.jar:2.2.1] > > > > > > ERROR [CompactionExecutor:2] ... FileUtils.java:463 - Existing > > forcefully due to file system exception on startup, disk failure > policy > > "stop" > > > > I tried sstablescrub but it crashed with hs-err-pid-... > > I removed the corrupted file and started the Node again, after one > day the > > corruption came back again, I removed the files, and restarted > Cassandra, it > > worked for few days, then I ran "nodetool repair" after it finished, > > Cassandra failed again but with commitlog corruption, after removing > the > > commitlog files, it failed again with another sstable corruption. > > > > I was also checking the HW, file system, and memory, the VMware logs > showed > > no HW error, also the HW management logs showed NO problems or > issues. >
Re: New node block in autobootstrap
What version are you in? This seems like a typical case were there was a problem with streaming (hanging, etc), do you have access to the logs? Maybe look for streaming errors? Typically streaming errors are related to timeouts, so you should review your cassandra streaming_socket_timeout_in_ms and kernel tcp_keepalive settings. If you're on 2.2+ you can resume a failed bootstrap with nodetool bootstrap resume. There were also some streaming hanging problems fixed recently, so I'd advise you to upgrade to the latest version of your particular series for a more robust version. Is there any reason why you didn't use the replace procedure (-Dreplace_address) to replace the node with the same tokens? This would be a bit faster than remove + bootstrap procedure. 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud: > Hello, > > A client of mime have problems when adding a node in the cluster. > After 4 days, the node is still in joining mode, it doesn't have the same > level of load than the other and there seems to be no streaming from and to > the new node. > > This node has a history. > >1. At the begin, it was in a seed in the cluster. >2. Ops detected that client had problems with it. >3. They tried to reset it but failed. In their process they launched >several repair and rebuild process on the node. >4. Then they asked me to help them. >5. We stopped the node, >6. removed it from the list of seeds (more precisely it was replaced >by another node), >7. removed it from the cluster (I choose not to use decommission since >node data was compromised) >8. deleted all files from data, commitlog and savedcache directories. >9. after the leaving process ended, it was started as a fresh new node >and began autobootstrap. > > > As I don’t have direct access to the cluster I don't have a lot of > information, but I will have tomorrow (logs and results of some commands). > And I can ask for people any required information. > > Does someone have any idea of what could have happened and what I should > investigate first ? > What would you do to unlock the situation ? > > Context: The cluster consists of two DC, each with 15 nodes. Average load > is around 3 TB per node. The joining node froze a little after 2 TB. > > Thank you for your help. > Cheers, > > > -- > Jérôme Mainaud > jer...@mainaud.com >
New node block in autobootstrap
Hello, A client of mime have problems when adding a node in the cluster. After 4 days, the node is still in joining mode, it doesn't have the same level of load than the other and there seems to be no streaming from and to the new node. This node has a history. 1. At the begin, it was in a seed in the cluster. 2. Ops detected that client had problems with it. 3. They tried to reset it but failed. In their process they launched several repair and rebuild process on the node. 4. Then they asked me to help them. 5. We stopped the node, 6. removed it from the list of seeds (more precisely it was replaced by another node), 7. removed it from the cluster (I choose not to use decommission since node data was compromised) 8. deleted all files from data, commitlog and savedcache directories. 9. after the leaving process ended, it was started as a fresh new node and began autobootstrap. As I don’t have direct access to the cluster I don't have a lot of information, but I will have tomorrow (logs and results of some commands). And I can ask for people any required information. Does someone have any idea of what could have happened and what I should investigate first ? What would you do to unlock the situation ? Context: The cluster consists of two DC, each with 15 nodes. Average load is around 3 TB per node. The joining node froze a little after 2 TB. Thank you for your help. Cheers, -- Jérôme Mainaud jer...@mainaud.com
Re: unsubscibe
On Mon, Aug 15, 2016 at 10:28 AM Eric Evanswrote: > > I'm always surprised when a Google search for 'unsubscribe cassandra' > doesn't return mailing list results from people nicely telling someone > how to unsubscribe. > > Agreed. It doesn't make for a very welcoming community to attract new contributors, either.
Re: unsubscibe
On Sat, Aug 13, 2016 at 7:24 PM, James Carmanwrote: > Was the Google stuff really necessary? Couldn't you have just nicely told > them how to unsubscribe? I'm always surprised when a Google search for 'unsubscribe cassandra' doesn't return mailing list results from people nicely telling someone how to unsubscribe. -- Eric Evans john.eric.ev...@gmail.com
unsubscribe
unsubscribe
Failure when setting up cassandra in cluster
Hi all, Sorry if this is a fairly stupid question, but we've all only been exposed to Cassandra very recently. We're trying to configure a 2-node cluster with non-default credentials. Here's what I've been doing so far based on my understanding of the documentation. The platform is RHEL 7: 1. Use an RPM I found with Datastax to perform a basic cassandra installation. 2. Change the temporary directory in cassandra-env.sh, because nobody is allowed to execute anything in /tmp. 3. In cassandra.yaml, - change the cluster_name - empty the listen_address entry - define both VMs as seeds 4. Open port 7000 in the firewall. 5. Start cassandra. 6. In the cassandra.yaml, change to PasswordAuthenticator. 7. Run cqlsh -u cassandra -p cassandra -e "ALTER KEYSPACE system_auth WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };" 8. Restart cassandra 9. Perform 1-8 on the second node 10. To create a new user, run cqlsh -u cassandra -p cassandra -e "CREATE USER ${CASSANDRA_USERNAME} WITH PASSWORD '${CASSANDRA_PASSWORD}' SUPERUSER;" Step 10 fails with this error: Connection error: ('Unable to connect to any servers', {'127.0.0.1': AuthenticationFailed(u'Failed to authenticate to 127.0.0.1: code=0100 [Bad credentials] message="org.apache.cassandra.exceptions.UnavailableException: Cannot achieve consistency level QUORUM"',)}) What am I missing? Cheers Raimund
Cassandra 2.1.16 Release
Hey, I'd like to ask when you are going to release cassandra 2.1.16 especially because of https://issues.apache.org/jira/browse/CASSANDRA-11850 Best, Malte