#2: if using hdfs it's on the disks. You can use the HDFS command line to
browse your data. And then use s3distcp or simply distcp to copy data from
hdfs to S3. Or even use hdfs get commands to copy to local disk and then
use S3 cli to copy to s3
#3. Cost of accessing data in S3 from Ec2 nodes, t
If you are running on AWS I would recommend using s3 instead of hdfs as a
general practice if you are maintaining state or data there. This way you can
treat your spark clusters as ephemeral compute resources that you can swap out
easily -- eg if something breaks just spin up a fresh cluster and
There is no way to upgrade a running cluster here. You can stop a
cluster, and simply start a new cluster in the same way you started
the original cluster. That ought to be simple; the only issue I
suppose is that you have down-time since you have to shut the whole
thing down, but maybe that's acce
Hello,
Even I have the same queries in mind .
What all the upgrades where we can use EC2 as compare to normal servers for
spark and other big data product development .
Hope to get inputs from the community .
Thanks,
Divya
On Dec 4, 2015 6:05 AM, "Andy Davidson"
wrote:
> About 2 months ago I use
About 2 months ago I used spark-ec2 to set up a small cluster. The cluster
runs a spark streaming app 7x24 and stores the data to hdfs. I also need to
run some batch analytics on the data.
Now that I have a little more experience I wonder if this was a good way to
set up the cluster the following