Github user brkyvz commented on the pull request:
https://github.com/apache/spark/pull/9373#issuecomment-154861924
@harishreedharan I've been trying to test this patch, but I just couldn't
set up HDFS to work with Spark using the spark-ec2 scripts. Could you please
help me set up a cluster with HDFS so that I can benchmark this?
Basically, I can get HDFS up and running on the cluster, but Spark can't
access it. get the following exception when I use Hadoop 2:
```
scala> sc.parallelize(1 to 5).saveAsTextFile("hdfs:///trial")
java.io.IOException: Failed on local exception:
com.google.protobuf.InvalidProtocolBufferException: Protocol message contained
an invalid tag (zero).; Host Details : local host is:
"ip-172-31-13-113.us-west-2.compute.internal/172.31.13.113"; destination host
is: "ec2-52-32-160-227.us-west-2.compute.amazonaws.com":9000;
```
That looks like a Protobuf version incompatibility.
I launched the ec2 instances using:
```
ec2/spark-ec2 --spark-git-repo=https://github.com/brkyvz/spark
--spark-version=fc2951f6530bde932a0bc97f430c6c360eb03209 -s 2 --spot-price=0.2
-t m4.large --no-ganglia -i ... -k ... -r us-west-2 --hadoop-major-version 2
launch burak-streaming-test-2
```
I used to get the following when using `--hadoop-major-version 1`:
```
java.io.IOException: Failed on local exception: java.io.EOFException; Host
Details : local host is:
"ip-172-31-30-170.us-west-2.compute.internal/172.31.30.170"; destination host
is: "ec2-52-32-211-30.us-west-2.compute.amazonaws.com":9000;
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]