My sincere apology for responding too late. You're right Jonathan. That's also a fine approach.
Anyways, i tied up with other activities so couldn't check whether the issue still persisted. Though i had created proper security group rules and my cluster was in stopped state, i didn't actually delete the pending application queue last time when i stopped the cluster. Yesterday, when i started the cluster, i saw that around 720 applications were queued up. I killed those and didn't notice applications getting queued up further. Today, when i again started the cluster, the queue was empty. So i can confirm that it was '*indeed' an attack on my cluster earlier.* *Regards.* On Sat, 13 Jun 2020 at 17:04, Jonathan Aquilina <jaquil...@eagleeyet.net> wrote: > What you are saying is a bit of an easy fix. > > > > On the azure network security group lock down those public ip addresses to > be accessible from your ip address or those ip addresses that are meant to > have access to it. > > > > Regards, > > Jonathan Aquilina > > EagleEyeT > > > > Phone: +356 2033 0099 > > Moblie + 356 7995 7942 > > Email: sa...@eagleeyet.net > > Website: https://eagleeyet.net > > > > *From:* Gaurav Chhabra <varuag.chha...@gmail.com> > *Sent:* 13 June 2020 11:45 > *To:* Hariharan <hariharan...@gmail.com> > *Cc:* common-u...@hadoop.apache.org <user@hadoop.apache.org> > *Subject:* Re: Applications always showing in pending state even after > cluster restart > > > > Wow! What a guess, Hari! :) I wasn't sure those pending tasks could have > been related to an attack. This happened with me from 1st to 5th June'20. I > didn't check my Azure usage during that time though I was keeping tab > almost every day in May. On 8th June (Mon), when i checked the charges, the > Azure 'data transfer out' charges were showing $88, $90 & $110 for > bigdataserver-{5,6,7} respectively. I was shocked as my last month charge > was around $53. I opened a ticket with Azure and then we again started the > cluster (with Azure networking guy along with me) and within 3-4 minutes, > data transfer out again was around 10-12 GB in total (from 3 instances). We > could only figure out that the hits were going to some blob storage in > Azure. He said it most likely seems to be a virus or some attack. > > > > I have now removed public IPs from all instances except two instances (one > where Cloudera Manager is hosted and another where Resource Manager is > running). Even those two exposed ones are allowed incoming requests > specifically from my laptop's IP. Things are fine now. > > > > One thing that i don't get is how's the attacker 'personally' benefitting > from this except for obviously raising my monthly bill? > > > > > > Regards > > > > > > > > On Sat, 13 Jun 2020 at 11:00, Hariharan <hariharan...@gmail.com> wrote: > > This is most likely an attempt to attack your system. If you are running > your cluster in the cloud, you should run it in a private network so it is > not exposed to the Internet. Alternatively you can secure your installation > as described here - > https://blog.cloudera.com/how-to-secure-internet-exposed-apache-hadoop/ > > > > Thanks, > > Hari > > > > On Fri, 12 Jun 2020, 12:20 Gaurav Chhabra, <varuag.chha...@gmail.com> > wrote: > > Hi All, > > > > > I have started learning Hadoop and its related components. I am following > a tutorial on Hadoop Administration on Udemy. As part of the learning > process, i ran the following command: > > $ hadoop jar > /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jarrandomtextwriter > -Ddfs.replication=1 /user/bigdata/randomtextwriter > > Above command created 30 files each of size 1 GB. Then i ran the below > reduce command: > > $ yarn jar/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar \ > wordcount \ > -Dmapreduce.input.fileinputformat.split.minsize=268435456\ > -Dmapreduce.job.reduces=8 \ > /user/bigdata/randomtext \ > /user/bigdata/wordcount > > After executing the above command, I just thought of killing the > application after some time so i ran 'yarn application -list' first which > listed a lot many applications out of which one was *wordcount*. I killed > that particular application using 'yarn application -kill > application-id'. However, when i checked the scheduler, i could see that > several applications were still showing in Pending state so i ran the > following command: > > > $ for x in $(yarn application -list -appStates ACCEPTED | awk 'NR > 2 { > print $1 }'); do yarn application -kill $x; done > > It was killing the applications as I could see the 'Apps Completed' count > was going up but as soon as all the apps got killed, I saw those > applications again getting created. Even if I stop the whole cluster and > start again, the scheduler shows that there are submitted applications in > Pending state. > > > > Here's the content of fair-scheduler.xml: > > <?xml version="1.0" encoding="UTF-8" standalone="yes"?> > > <allocations> > > <queue name="root"> > > <schedulingPolicy>drf</schedulingPolicy> > > <queue name="default"> > > <schedulingPolicy>drf</schedulingPolicy> > > </queue> > > </queue> > > <queuePlacementPolicy> > > <rule name="specified" create="false"/> > > <rule name="default" create="true"/> > > </queuePlacementPolicy> > > </allocations> > > This is just a test cluster. I just want to kill the applications/clear > the application queue. Any help will really be appreciated as I am > struggling with it for the last few days. > > > > > > Regards > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org > For additional commands, e-mail: user-h...@hadoop.apache.org > >