Sorry please ignore this if you like. Looks like the network throughput is very low but every worker/executor machine is indeed working.
My current incoming Network throughput on each worker machine is about 2.5KB/s (Kilobyte per second) so this needs to go somewhere in 5MB-6MB/s and that means somehow the table scan to do the count of billion rows in Cassandra is not being done in parallel. On Wed, Nov 23, 2016 at 12:45 PM, kant kodali <kanth...@gmail.com> wrote: > Hi All, > > > Spark Shell doesnt seem to use spark workers but Spark Submit does. I had > the workers ips listed under conf/slaves file. > > I am trying to count number of rows in Cassandra using spark-shell so I > do the following on spark master > > val df = spark.sql("SELECT test from hello") // This has about billion rows > > scala> df.count > > [Stage 0:=> (686 + 2) / 24686] // What are these numbers precisely? > > This is taking forever so I checked the I/O, CPU, Network usage using > dstat, iostat and so on it looks like nothing is going on in worker > machines but for master I can see it. > > I am using spark 2.0.2 > > Any ideas on what is going on? and how to fix this? > > Thanks, > > kant > > >