Re: how to submit the spark job outside the cluster

2015-09-25 Thread Zhiliang Zhu
It seems that is due to spark  SPARK_LOCAL_IP setting.export 
SPARK_LOCAL_IP=localhost 
will not work.
Then, how it would be set.
Thank you all~~ 
 


 On Friday, September 25, 2015 5:57 PM, Zhiliang Zhu 
 wrote:
   

 Hi Steve,
Thanks a lot for your reply.
That is, some commands could work on the remote server gateway installed , but 
some other commands will not work.As expected, the remote machine is not in the 
same area network as the cluster, and the cluster's portis forbidden.
While I make the remote machine gateway for another local area cluster, it 
works fine, and the hadoopjob could be submitted on the machine remotedly.
However, I want to submit spark jobs remotely as hadoop jobs do In the 
gateway machine, I also copied the spark install directory from the cluster to 
it, conf/spark-env.shis also there. But I fail to submit spark job 
remotely...The error messages:
15/09/25 17:47:47 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/09/25 17:47:47 INFO Remoting: Starting remoting
15/09/25 17:47:48 ERROR netty.NettyTransport: failed to bind to 
/220.250.64.225:0, shutting down Netty transport
15/09/25 17:47:48 WARN util.Utils: Service 'sparkDriver' could not bind on port 
0. Attempting port 1.
15/09/25 17:47:48 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
Shutting down remote daemon.
15/09/25 17:47:48 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote 
daemon shut down; proceeding with flushing remote transports.
15/09/25 17:47:48 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
Remoting shut down.

...
Would you help some about it ...
Thank you very much!Zhiliang 

 


 On Friday, September 25, 2015 5:21 PM, Steve Loughran 
 wrote:
   

 

On 25 Sep 2015, at 05:25, Zhiliang Zhu  wrote:

However, I just could use "hadoop fs -ls/-mkdir/-rm XXX" commands to operate at 
the remote machine with gateway, 



which means the namenode is reachable; all those commands only need to interact 
with it.

but commands "hadoop fs -cat/-put XXX    YYY" would not work with error message 
as below:
put: File /user/zhuzl/wordcount/input/1._COPYING_ could only be replicated to 0 
nodes instead of minReplication (=1).  There are 2 datanode(s) running and 2 
node(s) are excluded in this operation.
15/09/25 10:44:00 INFO hdfs.DFSClient: Exception in createBlockOutputStream
org.apache.hadoop.net.ConnectTimeoutException: 6 millis timeout while 
waiting for channel to be ready for connect. ch : 
java.nio.channels.SocketChannel[connection-pending remote=/10.6.28.96:50010]


the client can't reach the datanodes

   

  

Re: how to submit the spark job outside the cluster

2015-09-25 Thread Zhiliang Zhu
And the remote machine is not in the same local area network with the cluster . 
 


 On Friday, September 25, 2015 12:28 PM, Zhiliang Zhu 
 wrote:
   

 Hi Zhan,
I have done that as your kind help.
However, I just could use "hadoop fs -ls/-mkdir/-rm XXX" commands to operate at 
the remote machine with gateway, 
but commands "hadoop fs -cat/-put XXX    YYY" would not work with error message 
as below:
put: File /user/zhuzl/wordcount/input/1._COPYING_ could only be replicated to 0 
nodes instead of minReplication (=1).  There are 2 datanode(s) running and 2 
node(s) are excluded in this operation.
15/09/25 10:44:00 INFO hdfs.DFSClient: Exception in createBlockOutputStream
org.apache.hadoop.net.ConnectTimeoutException: 6 millis timeout while 
waiting for channel to be ready for connect. ch : 
java.nio.channels.SocketChannel[connection-pending remote=/10.6.28.96:50010]...
in the cluster, all machines' /etc/hosts10.6.32.132  master  #all is local area 
network ip
10.6.28.96    core1    #must this place use global ip, in order to operate for 
remote machine ? 
10.6.26.160  core2  

in the remote machine's /etc/hosts
42.62.77.77 master  #all is global area network ip, or else no commands will 
work
42.62.77.81 core1   #but still -cat / -put will not work
42.62.77.83 core2

Would you help comment some...
Thank you very much!Zhiliang
 



 On Wednesday, September 23, 2015 11:30 AM, Zhan Zhang 
 wrote:
   

 Hi Zhiliang,
I cannot find a specific doc. But as far as I remember, you can log in one of 
your cluster machine, and find the hadoop configuration location, for example 
/etc/hadoop/conf, copy that directory to your local machine. Typically it has 
hdfs-site.xml, yarn-site.xml etc. In spark, the former is used to access hdfs, 
and the latter is used to launch application on top of yarn.
Then in the spark-env.sh, you add export HADOOP_CONF_DIR=/etc/hadoop/conf. 
Thanks.
Zhan Zhang

On Sep 22, 2015, at 8:14 PM, Zhiliang Zhu  wrote:

Hi Zhan,
Yes, I get it now. 
I have not ever deployed hadoop configuration locally, and do not find the 
specific doc, would you help provide the doc to do that...
Thank you,Zhiliang

On Wednesday, September 23, 2015 11:08 AM, Zhan Zhang  
wrote:


There is no difference between running the client in or out of the client 
(assuming there is no firewall or network connectivity issue), as long as you 
have hadoop configuration locally.  Here is the doc for running on yarn.
http://spark.apache.org/docs/latest/running-on-yarn.html
Thanks.
Zhan Zhang
On Sep 22, 2015, at 7:49 PM, Zhiliang Zhu  wrote:

Hi Zhan,
Thanks very much for your help comment.I also view it would be similar to 
hadoop job submit, however, I was not deciding whether it is like that whenit 
comes to spark.  
Have you ever tried that for spark...Would you give me the deployment doc for 
hadoop and spark gateway, since this is the first time for meto do that, I do 
not find the specific doc for it.

Best Regards,Zhiliang




On Wednesday, September 23, 2015 10:20 AM, Zhan Zhang  
wrote:


It should be similar to other hadoop jobs. You need hadoop configuration in 
your client machine, and point the HADOOP_CONF_DIR in spark to the 
configuration.
Thanks
Zhan Zhang
On Sep 22, 2015, at 6:37 PM, Zhiliang Zhu  wrote:

Dear Experts,

Spark job is running on the cluster by yarn. Since the job can be submited at 
the place on the machine from the cluster,however, I would like to submit the 
job from another machine which does not belong to the cluster.I know for this, 
hadoop job could be done by way of another machine which is installed hadoop 
gateway which is usedto connect the cluster.
Then what would go for spark, is it same as hadoop... And where is the 
instruction doc for installing this gateway...
Thank you very much~~Zhiliang 













   

  

Re: how to submit the spark job outside the cluster

2015-09-25 Thread Zhiliang Zhu
Hi Steve,
Thanks a lot for your reply.
That is, some commands could work on the remote server gateway installed , but 
some other commands will not work.As expected, the remote machine is not in the 
same area network as the cluster, and the cluster's portis forbidden.
While I make the remote machine gateway for another local area cluster, it 
works fine, and the hadoopjob could be submitted on the machine remotedly.
However, I want to submit spark jobs remotely as hadoop jobs do In the 
gateway machine, I also copied the spark install directory from the cluster to 
it, conf/spark-env.shis also there. But I fail to submit spark job 
remotely...The error messages:
15/09/25 17:47:47 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/09/25 17:47:47 INFO Remoting: Starting remoting
15/09/25 17:47:48 ERROR netty.NettyTransport: failed to bind to 
/220.250.64.225:0, shutting down Netty transport
15/09/25 17:47:48 WARN util.Utils: Service 'sparkDriver' could not bind on port 
0. Attempting port 1.
15/09/25 17:47:48 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
Shutting down remote daemon.
15/09/25 17:47:48 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote 
daemon shut down; proceeding with flushing remote transports.
15/09/25 17:47:48 INFO remote.RemoteActorRefProvider$RemotingTerminator: 
Remoting shut down.

...
Would you help some about it ...
Thank you very much!Zhiliang 

 


 On Friday, September 25, 2015 5:21 PM, Steve Loughran 
 wrote:
   

 

On 25 Sep 2015, at 05:25, Zhiliang Zhu  wrote:

However, I just could use "hadoop fs -ls/-mkdir/-rm XXX" commands to operate at 
the remote machine with gateway, 



which means the namenode is reachable; all those commands only need to interact 
with it.

but commands "hadoop fs -cat/-put XXX    YYY" would not work with error message 
as below:
put: File /user/zhuzl/wordcount/input/1._COPYING_ could only be replicated to 0 
nodes instead of minReplication (=1).  There are 2 datanode(s) running and 2 
node(s) are excluded in this operation.
15/09/25 10:44:00 INFO hdfs.DFSClient: Exception in createBlockOutputStream
org.apache.hadoop.net.ConnectTimeoutException: 6 millis timeout while 
waiting for channel to be ready for connect. ch : 
java.nio.channels.SocketChannel[connection-pending remote=/10.6.28.96:50010]


the client can't reach the datanodes

  

Re: how to submit the spark job outside the cluster

2015-09-25 Thread Steve Loughran

On 25 Sep 2015, at 05:25, Zhiliang Zhu 
> wrote:


However, I just could use "hadoop fs -ls/-mkdir/-rm XXX" commands to operate at 
the remote machine with gateway,


which means the namenode is reachable; all those commands only need to interact 
with it.

but commands "hadoop fs -cat/-put XXXYYY" would not work with error message 
as below:

put: File /user/zhuzl/wordcount/input/1._COPYING_ could only be replicated to 0 
nodes instead of minReplication (=1).  There are 2 datanode(s) running and 2 
node(s) are excluded in this operation.
15/09/25 10:44:00 INFO hdfs.DFSClient: Exception in createBlockOutputStream
org.apache.hadoop.net.ConnectTimeoutException: 6 millis timeout while 
waiting for channel to be ready for connect. ch : 
java.nio.channels.SocketChannel[connection-pending remote=/10.6.28.96:50010]


the client can't reach the datanodes


Re: how to submit the spark job outside the cluster

2015-09-24 Thread Zhiliang Zhu
Hi Zhan,
I have done that as your kind help.
However, I just could use "hadoop fs -ls/-mkdir/-rm XXX" commands to operate at 
the remote machine with gateway, 
but commands "hadoop fs -cat/-put XXX    YYY" would not work with error message 
as below:
put: File /user/zhuzl/wordcount/input/1._COPYING_ could only be replicated to 0 
nodes instead of minReplication (=1).  There are 2 datanode(s) running and 2 
node(s) are excluded in this operation.
15/09/25 10:44:00 INFO hdfs.DFSClient: Exception in createBlockOutputStream
org.apache.hadoop.net.ConnectTimeoutException: 6 millis timeout while 
waiting for channel to be ready for connect. ch : 
java.nio.channels.SocketChannel[connection-pending remote=/10.6.28.96:50010]...
in the cluster, all machines' /etc/hosts10.6.32.132  master  #all is local area 
network ip
10.6.28.96    core1    #must this place use global ip, in order to operate for 
remote machine ? 
10.6.26.160  core2  

in the remote machine's /etc/hosts
42.62.77.77 master  #all is global area network ip, or else no commands will 
work
42.62.77.81 core1   #but still -cat / -put will not work
42.62.77.83 core2

Would you help comment some...
Thank you very much!Zhiliang
 



 On Wednesday, September 23, 2015 11:30 AM, Zhan Zhang 
 wrote:
   

 Hi Zhiliang,
I cannot find a specific doc. But as far as I remember, you can log in one of 
your cluster machine, and find the hadoop configuration location, for example 
/etc/hadoop/conf, copy that directory to your local machine. Typically it has 
hdfs-site.xml, yarn-site.xml etc. In spark, the former is used to access hdfs, 
and the latter is used to launch application on top of yarn.
Then in the spark-env.sh, you add export HADOOP_CONF_DIR=/etc/hadoop/conf. 
Thanks.
Zhan Zhang

On Sep 22, 2015, at 8:14 PM, Zhiliang Zhu  wrote:

Hi Zhan,
Yes, I get it now. 
I have not ever deployed hadoop configuration locally, and do not find the 
specific doc, would you help provide the doc to do that...
Thank you,Zhiliang

On Wednesday, September 23, 2015 11:08 AM, Zhan Zhang  
wrote:


There is no difference between running the client in or out of the client 
(assuming there is no firewall or network connectivity issue), as long as you 
have hadoop configuration locally.  Here is the doc for running on yarn.
http://spark.apache.org/docs/latest/running-on-yarn.html
Thanks.
Zhan Zhang
On Sep 22, 2015, at 7:49 PM, Zhiliang Zhu  wrote:

Hi Zhan,
Thanks very much for your help comment.I also view it would be similar to 
hadoop job submit, however, I was not deciding whether it is like that whenit 
comes to spark.  
Have you ever tried that for spark...Would you give me the deployment doc for 
hadoop and spark gateway, since this is the first time for meto do that, I do 
not find the specific doc for it.

Best Regards,Zhiliang




On Wednesday, September 23, 2015 10:20 AM, Zhan Zhang  
wrote:


It should be similar to other hadoop jobs. You need hadoop configuration in 
your client machine, and point the HADOOP_CONF_DIR in spark to the 
configuration.
Thanks
Zhan Zhang
On Sep 22, 2015, at 6:37 PM, Zhiliang Zhu  wrote:

Dear Experts,

Spark job is running on the cluster by yarn. Since the job can be submited at 
the place on the machine from the cluster,however, I would like to submit the 
job from another machine which does not belong to the cluster.I know for this, 
hadoop job could be done by way of another machine which is installed hadoop 
gateway which is usedto connect the cluster.
Then what would go for spark, is it same as hadoop... And where is the 
instruction doc for installing this gateway...
Thank you very much~~Zhiliang 













  

Re: how to submit the spark job outside the cluster

2015-09-22 Thread Zhan Zhang
It should be similar to other hadoop jobs. You need hadoop configuration in 
your client machine, and point the HADOOP_CONF_DIR in spark to the 
configuration.

Thanks

Zhan Zhang
On Sep 22, 2015, at 6:37 PM, Zhiliang Zhu 
> wrote:

Dear Experts,

Spark job is running on the cluster by yarn. Since the job can be submited at 
the place on the machine from the cluster,
however, I would like to submit the job from another machine which does not 
belong to the cluster.
I know for this, hadoop job could be done by way of another machine which is 
installed hadoop gateway which is used
to connect the cluster.

Then what would go for spark, is it same as hadoop... And where is the 
instruction doc for installing this gateway...

Thank you very much~~
Zhiliang




Re: how to submit the spark job outside the cluster

2015-09-22 Thread Zhan Zhang
Hi Zhiliang,

I cannot find a specific doc. But as far as I remember, you can log in one of 
your cluster machine, and find the hadoop configuration location, for example 
/etc/hadoop/conf, copy that directory to your local machine.
Typically it has hdfs-site.xml, yarn-site.xml etc. In spark, the former is used 
to access hdfs, and the latter is used to launch application on top of yarn.

Then in the spark-env.sh, you add export HADOOP_CONF_DIR=/etc/hadoop/conf.

Thanks.

Zhan Zhang


On Sep 22, 2015, at 8:14 PM, Zhiliang Zhu 
> wrote:

Hi Zhan,

Yes, I get it now.
I have not ever deployed hadoop configuration locally, and do not find the 
specific doc, would you help provide the doc to do that...

Thank you,
Zhiliang

On Wednesday, September 23, 2015 11:08 AM, Zhan Zhang 
> wrote:


There is no difference between running the client in or out of the client 
(assuming there is no firewall or network connectivity issue), as long as you 
have hadoop configuration locally.  Here is the doc for running on yarn.

http://spark.apache.org/docs/latest/running-on-yarn.html

Thanks.

Zhan Zhang

On Sep 22, 2015, at 7:49 PM, Zhiliang Zhu 
> wrote:

Hi Zhan,

Thanks very much for your help comment.
I also view it would be similar to hadoop job submit, however, I was not 
deciding whether it is like that when
it comes to spark.

Have you ever tried that for spark...
Would you give me the deployment doc for hadoop and spark gateway, since this 
is the first time for me
to do that, I do not find the specific doc for it.

Best Regards,
Zhiliang





On Wednesday, September 23, 2015 10:20 AM, Zhan Zhang 
> wrote:


It should be similar to other hadoop jobs. You need hadoop configuration in 
your client machine, and point the HADOOP_CONF_DIR in spark to the 
configuration.

Thanks

Zhan Zhang
On Sep 22, 2015, at 6:37 PM, Zhiliang Zhu 
> wrote:

Dear Experts,

Spark job is running on the cluster by yarn. Since the job can be submited at 
the place on the machine from the cluster,
however, I would like to submit the job from another machine which does not 
belong to the cluster.
I know for this, hadoop job could be done by way of another machine which is 
installed hadoop gateway which is used
to connect the cluster.

Then what would go for spark, is it same as hadoop... And where is the 
instruction doc for installing this gateway...

Thank you very much~~
Zhiliang










Re: how to submit the spark job outside the cluster

2015-09-22 Thread Zhiliang Zhu
Hi Zhan,
Yes, I get it now. 
I have not ever deployed hadoop configuration locally, and do not find the 
specific doc, would you help provide the doc to do that...
Thank you,Zhiliang

 On Wednesday, September 23, 2015 11:08 AM, Zhan Zhang 
 wrote:
   

 There is no difference between running the client in or out of the client 
(assuming there is no firewall or network connectivity issue), as long as you 
have hadoop configuration locally.  Here is the doc for running on yarn.
http://spark.apache.org/docs/latest/running-on-yarn.html
Thanks.
Zhan Zhang
On Sep 22, 2015, at 7:49 PM, Zhiliang Zhu  wrote:

Hi Zhan,
Thanks very much for your help comment.I also view it would be similar to 
hadoop job submit, however, I was not deciding whether it is like that whenit 
comes to spark.  
Have you ever tried that for spark...Would you give me the deployment doc for 
hadoop and spark gateway, since this is the first time for meto do that, I do 
not find the specific doc for it.

Best Regards,Zhiliang




On Wednesday, September 23, 2015 10:20 AM, Zhan Zhang  
wrote:


It should be similar to other hadoop jobs. You need hadoop configuration in 
your client machine, and point the HADOOP_CONF_DIR in spark to the 
configuration.
Thanks
Zhan Zhang
On Sep 22, 2015, at 6:37 PM, Zhiliang Zhu  wrote:

Dear Experts,

Spark job is running on the cluster by yarn. Since the job can be submited at 
the place on the machine from the cluster,however, I would like to submit the 
job from another machine which does not belong to the cluster.I know for this, 
hadoop job could be done by way of another machine which is installed hadoop 
gateway which is usedto connect the cluster.
Then what would go for spark, is it same as hadoop... And where is the 
instruction doc for installing this gateway...
Thank you very much~~Zhiliang 









  

Re: how to submit the spark job outside the cluster

2015-09-22 Thread Zhiliang Zhu
Hi Zhan,
Thanks very much for your help comment.I also view it would be similar to 
hadoop job submit, however, I was not deciding whether it is like that when it 
comes to spark.  
Have you ever tried that for spark...Would you give me the deployment doc for 
hadoop and spark gateway, since this is the first time for meto do that, I do 
not find the specific doc for it. 

Best Regards,Zhiliang

 


 On Wednesday, September 23, 2015 10:20 AM, Zhan Zhang 
 wrote:
   

 It should be similar to other hadoop jobs. You need hadoop configuration in 
your client machine, and point the HADOOP_CONF_DIR in spark to the 
configuration.
Thanks
Zhan Zhang
On Sep 22, 2015, at 6:37 PM, Zhiliang Zhu  wrote:

Dear Experts,

Spark job is running on the cluster by yarn. Since the job can be submited at 
the place on the machine from the cluster,however, I would like to submit the 
job from another machine which does not belong to the cluster.I know for this, 
hadoop job could be done by way of another machine which is installed hadoop 
gateway which is usedto connect the cluster.
Then what would go for spark, is it same as hadoop... And where is the 
instruction doc for installing this gateway...
Thank you very much~~Zhiliang 





  

Re: how to submit the spark job outside the cluster

2015-09-22 Thread Zhan Zhang
There is no difference between running the client in or out of the client 
(assuming there is no firewall or network connectivity issue), as long as you 
have hadoop configuration locally.  Here is the doc for running on yarn.

http://spark.apache.org/docs/latest/running-on-yarn.html

Thanks.

Zhan Zhang

On Sep 22, 2015, at 7:49 PM, Zhiliang Zhu 
> wrote:

Hi Zhan,

Thanks very much for your help comment.
I also view it would be similar to hadoop job submit, however, I was not 
deciding whether it is like that when
it comes to spark.

Have you ever tried that for spark...
Would you give me the deployment doc for hadoop and spark gateway, since this 
is the first time for me
to do that, I do not find the specific doc for it.

Best Regards,
Zhiliang





On Wednesday, September 23, 2015 10:20 AM, Zhan Zhang 
> wrote:


It should be similar to other hadoop jobs. You need hadoop configuration in 
your client machine, and point the HADOOP_CONF_DIR in spark to the 
configuration.

Thanks

Zhan Zhang
On Sep 22, 2015, at 6:37 PM, Zhiliang Zhu 
> wrote:

Dear Experts,

Spark job is running on the cluster by yarn. Since the job can be submited at 
the place on the machine from the cluster,
however, I would like to submit the job from another machine which does not 
belong to the cluster.
I know for this, hadoop job could be done by way of another machine which is 
installed hadoop gateway which is used
to connect the cluster.

Then what would go for spark, is it same as hadoop... And where is the 
instruction doc for installing this gateway...

Thank you very much~~
Zhiliang







Re: how to submit the spark job outside the cluster

2015-09-22 Thread Zhiliang Zhu
Hi Zhan,
I really appreciate your help, I will do as that next.And on the local machine, 
no hadoop/spark needs to be installed, but only copied with the 
/etc/hadoop/conf... whether the information (for example IP, hostname etc) of 
local machine 
would be set in the conf files...

Moreover, do you have any exprience to submit hadoop/spark job by way of java 
program deployed on thegateway node, but not by way of hadoop/spark command...
Thank you very much~Best Regards,Zhiliang


 


 On Wednesday, September 23, 2015 11:30 AM, Zhan Zhang 
 wrote:
   

 Hi Zhiliang,
I cannot find a specific doc. But as far as I remember, you can log in one of 
your cluster machine, and find the hadoop configuration location, for example 
/etc/hadoop/conf, copy that directory to your local machine. Typically it has 
hdfs-site.xml, yarn-site.xml etc. In spark, the former is used to access hdfs, 
and the latter is used to launch application on top of yarn.
Then in the spark-env.sh, you add export HADOOP_CONF_DIR=/etc/hadoop/conf. 
Thanks.
Zhan Zhang

On Sep 22, 2015, at 8:14 PM, Zhiliang Zhu  wrote:

Hi Zhan,
Yes, I get it now. 
I have not ever deployed hadoop configuration locally, and do not find the 
specific doc, would you help provide the doc to do that...
Thank you,Zhiliang

On Wednesday, September 23, 2015 11:08 AM, Zhan Zhang  
wrote:


There is no difference between running the client in or out of the client 
(assuming there is no firewall or network connectivity issue), as long as you 
have hadoop configuration locally.  Here is the doc for running on yarn.
http://spark.apache.org/docs/latest/running-on-yarn.html
Thanks.
Zhan Zhang
On Sep 22, 2015, at 7:49 PM, Zhiliang Zhu  wrote:

Hi Zhan,
Thanks very much for your help comment.I also view it would be similar to 
hadoop job submit, however, I was not deciding whether it is like that whenit 
comes to spark.  
Have you ever tried that for spark...Would you give me the deployment doc for 
hadoop and spark gateway, since this is the first time for meto do that, I do 
not find the specific doc for it.

Best Regards,Zhiliang




On Wednesday, September 23, 2015 10:20 AM, Zhan Zhang  
wrote:


It should be similar to other hadoop jobs. You need hadoop configuration in 
your client machine, and point the HADOOP_CONF_DIR in spark to the 
configuration.
Thanks
Zhan Zhang
On Sep 22, 2015, at 6:37 PM, Zhiliang Zhu  wrote:

Dear Experts,

Spark job is running on the cluster by yarn. Since the job can be submited at 
the place on the machine from the cluster,however, I would like to submit the 
job from another machine which does not belong to the cluster.I know for this, 
hadoop job could be done by way of another machine which is installed hadoop 
gateway which is usedto connect the cluster.
Then what would go for spark, is it same as hadoop... And where is the 
instruction doc for installing this gateway...
Thank you very much~~Zhiliang