encounter issue Application id exceeding 9999

2016-11-21 Thread Divya Gehlot
Hi ,
I am working with hadoop 2.7.2 and as my application running at shorter
time interval say every 5 mins with no downtime .
Now I as the application id exceeds beyond 1 I have hit the below issue

https://issues.apache.org/jira/browse/YARN-3840

Now my query is if I update the below properties in yarn-site.xml

yarn.resourcemanager.max-completed-applications
1
yarn-default.xml




yarn.timeline-service.generic-application-history.max-applications

1
yarn-default.xml



Can I see the sorted application id in hadoop UI and yarn logs too.

Thanks,
Divya


read multiple files

2016-09-27 Thread Divya Gehlot
Hi,
The input data files for my spark job generated at every five minutes file
name follows epoch time convention  as below :

InputFolder/batch-147495960
InputFolder/batch-147495990
InputFolder/batch-147496020
InputFolder/batch-147496050
InputFolder/batch-147496080
InputFolder/batch-147496110
InputFolder/batch-147496140
InputFolder/batch-147496170
InputFolder/batch-147496200
InputFolder/batch-147496230

As per requirement I need to read one month of data from current timestamp.

Would really appreciate if anybody could help me .

Thanks,
Divya


[How to :]Apache Impala and S3n File System

2016-09-19 Thread Divya Gehlot
Hi,

Has any body tried using the Apache Impala with S3n File system.
Could share me the pros and cons of it.
Appreciate the help.


Thanks,
Divya


how to specify cores and executor to run spark jobs simultaneously

2016-09-14 Thread Divya Gehlot
Hi,

I am on EMR cluster and My cluster configuration is as below:
Number of nodes including master node - 3
Memory:22.50 GB
VCores Total : 16
Active Nodes : 2
Spark version- 1.6.1

Parameter set in spark-default.conf

spark.executor.instances 2
> spark.executor.cores 8
> spark.driver.memory  10473M
> spark.executor.memory9658M
> spark.default.parallelism32


Would let me know if need any other info regarding the cluster .

The current configuration for spark-submit is
--driver-memory 5G \
--executor-memory 2G \
--executor-cores 5 \
--num-executors 10 \


Currently  with the above job configuration if I try to run another spark
job it will be in accepted state till the first one finishes .
How do I optimize or update the above spark-submit configurations to run
some more spark jobs simultaneously

Would really appreciate the help.

Thanks,
Divya


Re: [Erorr:]vieiwng Web UI on EMR cluster

2016-09-13 Thread Divya Gehlot
Hi ,
Thank you all..
Hurray ...I am able to view the hadoop web UI now  @ 8088 . even Spark
Hisroty server Web UI @ 18080
But unable to figure out the Spark UI web port ...
Tried with 4044,4040.. ..
getting below error
This site can’t be reached
How can I find out the Spark port ?

Would really appreciate the help.

Thanks,
Divya


On 13 September 2016 at 15:09, Divya Gehlot <divya.htco...@gmail.com> wrote:

> Hi,
> Thanks all for your prompt response.
> I followed the instruction in the docs EMR SSH tunnel
> <https://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/emr-ssh-tunnel.html>
> shared by Jonathan.
> I am on MAC and set up foxy proxy in my chrome browser
>
> Divyas-MacBook-Pro:.ssh divyag$ ssh  -N -D 8157
> had...@ec2-xx-xxx-xxx-xx.ap-southeast-1.compute.amazonaws.com
>
> channel 3: open failed: connect failed: Connection refused
>
> channel 3: open failed: connect failed: Connection refused
>
> channel 4: open failed: connect failed: Connection refused
>
> channel 3: open failed: connect failed: Connection refused
>
> channel 4: open failed: connect failed: Connection refused
>
> channel 3: open failed: connect failed: Connection refused
>
> channel 3: open failed: connect failed: Connection refused
>
> channel 4: open failed: connect failed: Connection refused
>
> channel 5: open failed: connect failed: Connection refused
>
> channel 22: open failed: connect failed: Connection refused
>
> channel 23: open failed: connect failed: Connection refused
>
> channel 22: open failed: connect failed: Connection refused
>
> channel 23: open failed: connect failed: Connection refused
>
> channel 22: open failed: connect failed: Connection refused
>
> channel 8: open failed: administratively prohibited: open failed
>
>
> What am I missing now ?
>
>
> Thanks,
>
> Divya
>
> On 13 September 2016 at 14:23, Jonathan Kelly <jonathaka...@gmail.com>
> wrote:
>
>> I would not recommend opening port 50070 on your cluster, as that would
>> give the entire world access to your data on HDFS. Instead, you should
>> follow the instructions found here to create a secure tunnel to the
>> cluster, through which you can proxy requests to the UIs using a browser
>> plugin like FoxyProxy: https://docs.aws.amazon.com/ElasticMapReduce/late
>> st/ManagementGuide/emr-ssh-tunnel.html
>>
>> ~ Jonathan
>>
>> On Mon, Sep 12, 2016 at 10:40 PM Mohammad Tariq <donta...@gmail.com>
>> wrote:
>>
>>> Hi Divya,
>>>
>>> Do you you have inbounds enabled on port 50070 of your NN machine. Also,
>>> it's a good idea to have the public DNS in your /etc/hosts for proper name
>>> resolution.
>>>
>>>
>>> [image: --]
>>>
>>> Tariq, Mohammad
>>> [image: https://]about.me/mti
>>>
>>> <https://about.me/mti?promo=email_sig_source=email_sig_medium=external_link_campaign=chrome_ext>
>>>
>>>
>>>
>>>
>>> [image: http://] <http://about.me/mti>
>>> Tariq, Mohammad
>>> about.me/mti
>>> [image: http://]
>>> <http://about.me/mti>
>>>
>>> On Tue, Sep 13, 2016 at 9:28 AM, Divya Gehlot <divya.htco...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>> I am on EMR 4.7 with Spark 1.6.1   and Hadoop 2.7.2
>>>> When I am trying to view Any of the web UI of the cluster either hadoop
>>>> or Spark ,I am getting below error
>>>> "
>>>> This site can’t be reached
>>>>
>>>> "
>>>> Has anybody using EMR and able to view WebUI .
>>>> Could you please share the steps.
>>>>
>>>> Would really appreciate the help.
>>>>
>>>> Thanks,
>>>> Divya
>>>>
>>>
>>>
>


Ways to check Spark submit running

2016-09-13 Thread Divya Gehlot
Hi,

Some how for time being  I am unable to view Spark Web UI and Hadoop Web UI.
Looking for other ways ,I can check my job is running fine apart from keep
checking current yarn logs .


Thanks,
Divya


[Erorr:]vieiwng Web UI on EMR cluster

2016-09-12 Thread Divya Gehlot
Hi,
I am on EMR 4.7 with Spark 1.6.1   and Hadoop 2.7.2
When I am trying to view Any of the web UI of the cluster either hadoop or
Spark ,I am getting below error
"
This site can’t be reached

"
Has anybody using EMR and able to view WebUI .
Could you please share the steps.

Would really appreciate the help.

Thanks,
Divya


Re: [Spark submit] getting error when use properties file parameter in spark submit

2016-09-06 Thread Divya Gehlot
Yes I am reading from s3 bucket ..
Strangely the  error goes off when I remove the properties girl parameter .

On Sep 6, 2016 8:35 PM, "Sonal Goyal" <sonalgoy...@gmail.com> wrote:

> Looks like a classpath issue - Caused by: java.lang.ClassNotFoundException:
> com.amazonaws.services.s3.AmazonS3
>
> Are you using S3 somewhere? Are the required jars in place?
>
> Best Regards,
> Sonal
> Founder, Nube Technologies <http://www.nubetech.co>
> Reifier at Strata Hadoop World
> <https://www.youtube.com/watch?v=eD3LkpPQIgM>
> Reifier at Spark Summit 2015
> <https://spark-summit.org/2015/events/real-time-fuzzy-matching-with-spark-and-elastic-search/>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
> On Tue, Sep 6, 2016 at 4:45 PM, Divya Gehlot <divya.htco...@gmail.com>
> wrote:
>
>> Hi,
>> I am getting below error if I try to use properties file paramater in
>> spark-submit
>>
>> Exception in thread "main" java.util.ServiceConfigurationError:
>> org.apache.hadoop.fs.FileSystem: Provider 
>> org.apache.hadoop.fs.s3a.S3AFileSystem
>> could not be instantiated
>> at java.util.ServiceLoader.fail(ServiceLoader.java:224)
>> at java.util.ServiceLoader.access$100(ServiceLoader.java:181)
>> at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377)
>> at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
>> at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2673)
>> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSyste
>> m.java:2684)
>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2701)
>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
>> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem
>> .java:2737)
>> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2719)
>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:375)
>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:174)
>> at org.apache.spark.deploy.yarn.ApplicationMaster.run(Applicati
>> onMaster.scala:142)
>> at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main
>> $1.apply$mcV$sp(ApplicationMaster.scala:653)
>> at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHad
>> oopUtil.scala:69)
>> at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHad
>> oopUtil.scala:68)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>> upInformation.java:1657)
>> at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(Spark
>> HadoopUtil.scala:68)
>> at org.apache.spark.deploy.yarn.ApplicationMaster$.main(Applica
>> tionMaster.scala:651)
>> at org.apache.spark.deploy.yarn.ApplicationMaster.main(Applicat
>> ionMaster.scala)
>> Caused by: java.lang.NoClassDefFoundError: com/amazonaws/services/s3/Amaz
>> onS3
>> at java.lang.Class.getDeclaredConstructors0(Native Method)
>> at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
>> at java.lang.Class.getConstructor0(Class.java:2895)
>> at java.lang.Class.newInstance(Class.java:354)
>> at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
>> ... 19 more
>> Caused by: java.lang.ClassNotFoundException:
>> com.amazonaws.services.s3.AmazonS3
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> ... 24 more
>> End of LogType:stderr
>>
>> If I remove the --properties-file parameter
>> the error is gone
>>
>> Would really appreciate the help .
>>
>>
>>
>> Thanks,
>> Divya
>>
>
>


[Spark submit] getting error when use properties file parameter in spark submit

2016-09-06 Thread Divya Gehlot
Hi,
I am getting below error if I try to use properties file paramater in
spark-submit

Exception in thread "main" java.util.ServiceConfigurationError:
org.apache.hadoop.fs.FileSystem: Provider
org.apache.hadoop.fs.s3a.S3AFileSystem could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:224)
at java.util.ServiceLoader.access$100(ServiceLoader.java:181)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377)
at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2673)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2684)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2701)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2737)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2719)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:375)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:174)
at
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:142)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:653)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
at
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:651)
at
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: java.lang.NoClassDefFoundError:
com/amazonaws/services/s3/AmazonS3
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
at java.lang.Class.getConstructor0(Class.java:2895)
at java.lang.Class.newInstance(Class.java:354)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
... 19 more
Caused by: java.lang.ClassNotFoundException:
com.amazonaws.services.s3.AmazonS3
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 24 more
End of LogType:stderr

If I remove the --properties-file parameter
the error is gone

Would really appreciate the help .



Thanks,
Divya


[Spark-Submit:]Error while reading from s3n

2016-09-06 Thread Divya Gehlot
Hi,
I am on EMR 4.7 with Spark 1.6.1
I am trying to read from s3n buckets in spark
Option 1 :
If I set up

hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3.S3FileSystem")
hadoopConf.set("fs.s3.awsSecretAccessKey", sys.env("AWS_SECRET_ACCESS_KEY"))
hadoopConf.set("fs.s3.awsAccessKeyId", sys.env("AWS_ACCESS_KEY_ID"))


and access the bucket as s3://bucket-name

I am getting below error

Exception in thread "main" java.io.IOException: /batch-147313410
doesn't exist
at 
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:170)
at 
org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:221)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy37.retrieveINode(Unknown Source)
at 
org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:340)
at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
at org.apache.hadoop.fs.Globber.glob(Globber.java:252)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1730)
at 
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:231)
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:201)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:281)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)

Option 2:

If I set up

hadoopConf.set("fs.s3n.impl",
"org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoopConf.set("fs.s3n.awsSecretAccessKey", sys.env("AWS_SECRET_ACCESS_KEY"))
hadoopConf.set("fs.s3n.awsAccessKeyId", sys.env("AWS_ACCESS_KEY_ID"))


and try to access the bucket as s3n://bucket-name

getting the below error :


Caused by: org.apache.hadoop.security.AccessControlException:
Permission denied: s3n://bucket-name/batch-147313710_$folder$
at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:449)
at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:427)
at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
at 
org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:181)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)




When I try to list the bucket using aws cli

aws s3 ls s3://bucket-name/


It gives me the bucket listing.


Would really appreciate the help.



Thanks,

Divya


[How To :]Copy Merge + S3 File System

2016-07-25 Thread Divya Gehlot
Hi,

val srcPath =  s3n://bucket_name_1
> val dstPath =s3n://bucket_name_2
> val config = new Configuration()
> val fs = FileSystem.get(URI.create(srcPath), config)
> FileUtil.copyMerge(fs, new Path(srcPath), fs, new Path(dstPath), false,
> config, null)


I am trying to use Copymerge .. Its not throwing error but its not merging
the files from one bucket to another .

Would really appreciate the help.

Thanks,
Divya


[Error:] When writing To Phoenix 4.4

2016-04-11 Thread Divya Gehlot
Hi,
I am getting error when I try to write data to Phoenix .
*Software Confguration :*
Spark 1.5.2
Phoenix 4.4
Hbase 1.1

*Spark Scala Script :*
val dfLCR = readTable(sqlContext, "", "TEST")
val schemaL = dfLCR.schema
val lcrReportPath = "/TestDivya/Spark/Results/TestData/"
val dfReadReport=
sqlContext.read.format("com.databricks.spark.csv").option("header",
"true").schema(schemaL).load(lcrReportPath)
dfReadlcrReport.show()
val dfWidCol = dfReadReport.withColumn("RPT_DATE",lit("2015-01-01"))
val dfSelect = dfWidCol.select("RPT_DATE")
dfSelect.write.format("org.apache.phoenix.spark").mode(SaveMode.Overwrite).options(collection.immutable.Map(
"zkUrl" -> "localhost",
"table" -> "TEST")).save()

*Command Line to run Script *
spark-shell  --conf
"spark.driver.extraClassPath=/usr/hdp/2.3.4.0-3485/phoenix/phoenix-client.jar"
 --conf
"spark.executor.extraClassPath=/usr/hdp/2.3.4.0-3485/phoenix/phoenix-client.jar"
--properties-file  /TestDivya/Spark/Phoenix.properties --jars
/usr/hdp/2.3.4.0-3485/phoenix/lib/phoenix-spark-4.4.0.2.3.4.0-3485.jar,/usr/hdp/2.3.4.0-3485/phoenix/phoenix-client.jar
 --driver-class-path
/usr/hdp/2.3.4.0-3485/phoenix/lib/phoenix-spark-4.4.0.2.3.4.0-3485.jar,/usr/hdp/2.3.4.0-3485/hbase/lib/phoenix-client-4.4.0.jar
 --packages com.databricks:spark-csv_2.10:1.4.0  --master yarn-client -i
/TestDivya/Spark/WriteToPheonix.scala

*Error Stack Trace :*
16/04/12 02:53:59 INFO YarnScheduler: Removed TaskSet 3.0, whose tasks have
all completed, from pool
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1
in stage 3.0 failed 4 times, most recent failure: Lost task 1.3 in stage
3.0 (TID 410, ip-172-31-22-135.ap-southeast-1.compute.internal):
java.lang.RuntimeException: java.sql.SQLException: No suitable driver found
for jdbc:phoenix:localhost:2181:/hbase-unsecure;
at
org.apache.phoenix.mapreduce.PhoenixOutputFormat.getRecordWriter(PhoenixOutputFormat.java:58)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1030)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1014)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.sql.SQLException: No suitable driver found for
jdbc:phoenix:localhost:2181:/hbase-unsecure;
at java.sql.DriverManager.getConnection(DriverManager.java:596)
at java.sql.DriverManager.getConnection(DriverManager.java:187)
at
org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:99)
at
org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:82)
at
org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:70)
at
org.apache.phoenix.mapreduce.PhoenixRecordWriter.(PhoenixRecordWriter.java:49)
at
org.apache.phoenix.mapreduce.PhoenixOutputFormat.getRecordWriter(PhoenixOutputFormat.java:55)
... 8 more

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
at 

[HELP:]Save Spark Dataframe in Phoenix Table

2016-04-07 Thread Divya Gehlot
Hi,
I hava a Hortonworks Hadoop cluster having below Configurations :
Spark 1.5.2
HBASE 1.1.x
Phoenix 4.4

I am able to connect to Phoenix through JDBC connection and able to read
the Phoenix tables .
But while writing the data back to Phoenix table
I am getting below error :

org.apache.spark.sql.AnalysisException:
org.apache.phoenix.spark.DefaultSource does not allow user-specified
schemas.;

Can any body help in resolving the above errors or any other solution of
saving Spark Dataframes to Phoenix.

Would really appareciate the help.

Thanks,
Divya


[Query:]Table creation with column family in Phoenix

2016-03-11 Thread Divya Gehlot
Hi,
I created a table in Phoenix with three column families  and Inserted the
values as shown below

Syntax :

> CREATE TABLE TESTCF (MYKEY VARCHAR NOT NULL PRIMARY KEY, CF1.COL1 VARCHAR,
> CF2.COL2 VARCHAR, CF3.COL3 VARCHAR)
> UPSERT INTO TESTCF (MYKEY,CF1.COL1,CF2.COL2,CF3.COL3)values
> ('Key2','CF1','CF2','CF3')
> UPSERT INTO TESTCF (MYKEY,CF1.COL1,CF2.COL2,CF3.COL3)values
> ('Key2','CF12','CF22','CF32')


 When I try to scan same table in Hbase
hbase(main):010:0> scan "TESTCF"

> ROW   COLUMN+CELL
>  Key1 column=CF1:COL1, timestamp=1457682385805, value=CF1
>  Key1 column=CF1:_0, timestamp=1457682385805, value=
>  Key1 column=CF2:COL2, timestamp=1457682385805, value=CF2
>  Key1 column=CF3:COL3, timestamp=1457682385805, value=CF3
>  Key2 column=CF1:COL1, timestamp=1457682426396, value=CF12
>  Key2 column=CF1:_0, timestamp=1457682426396, value=
>  Key2 column=CF2:COL2, timestamp=1457682426396, value=CF22
>  Key2 column=CF3:COL3, timestamp=1457682426396, value=CF32
> 2 row(s) in 0.0260 seconds


My query is why I am getting CF1:_0 one extra column in each row with no
value.

Can any body explain me .
Would really appreciate the help.

Thanks,
Divya


[Error]Run Spark job as hdfs user from oozie workflow

2016-03-09 Thread Divya Gehlot
Hi,
I have non secure  Hadoop 2.7.2 cluster on EC2 having Spark 1.5.2
When I am submitting my spark scala script through shell script using Oozie
workflow.
I am submitting job as hdfs user but It is running as user = "yarn" so all
the output should get store under user/yarn directory only .

When I googled and got YARN-2424
 for non secure cluster
I changed the settings as per this docs

and when I ran my Oozie workflow as hdfs user  got below error

Application application_1457494230162_0004 failed 2 times due to AM
Container for appattempt_1457494230162_0004_02 exited with exitCode:
-1000
For more detailed output, check application tracking page:
http://ip-xxx-xx-xx-xxx.ap-southeast-1.compute.internal:8088/cluster/app/application_1457494230162_0004Then
,
click on links to logs of each attempt.
Diagnostics: Application application_1457494230162_0004 initialization
failed (exitCode=255) with output: main : command provided 0
main : run as user is hdfs
main : requested yarn user is hdfs
Can't create directory
/hadoop/yarn/local/usercache/hdfs/appcache/application_1457494230162_0004 -
Permission denied
Did not create any app directories
Failing this attempt. Failing the application.

After changing the settiing when I start spark shell
I got error saying that Error starting SQLContext -Yarn application has
ended

Has anybody ran into these kind of issues?
Would really appreciate if you could guide me to the steps/docs to resolve
it.


Thanks,
Divya


[ERROR]: Spark 1.5.2 + Hbase 1.1 + Hive 1.2 + HbaseIntegration

2016-02-29 Thread Divya Gehlot
Hi,
I am getting error when I am trying to connect hive table (which is being
created through HbaseIntegration) in spark

Steps I followed :
*Hive Table creation code  *:
CREATE EXTERNAL TABLE IF NOT EXISTS TEST(NAME STRING,AGE INT)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,0:AGE")
TBLPROPERTIES ("hbase.table.name" = "TEST",
"hbase.mapred.output.outputtable" = "TEST");


*DESCRIBE TEST ;*
col_namedata_typecomment
namestring from deserializer
age   int from deserializer


*Spark Code :*
import org.apache.spark._
import org.apache.spark.sql._

val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
hiveContext.sql("from TEST SELECT  NAME").collect.foreach(println)


*Starting Spark shell*
spark-shell --jars
/usr/hdp/2.3.4.0-3485/hive/lib/guava-14.0.1.jar,/usr/hdp/2.3.4.0-3485/hive/lib/hive-hbase-handler.jar,/usr/hdp/2.3.4.0-3485/hbase/lib/hbase-client.jar,/usr/hdp/2.3.4.0-3485/hbase/lib/hbase-common.jar,/usr/hdp/2.3.4.0-3485/hbase/lib/hbase-protocol.jar,/usr/hdp/2.3.4.0-3485/hbase/lib/hbase-server.jar,/usr/hdp/2.3.4.0-3485/hive/lib/hive-hbase-handler.jar,/usr/hdp/2.3.4.0-3485/hive/lib/htrace-core-3.1.0-incubating.jar,/usr/hdp/2.3.4.0-3485/hive/lib/zookeeper-3.4.6.2.3.4.0-3485.jar,/usr/hdp/2.3.4.0-3485/hbase/lib/hbase-server.jar
--driver-class-path
/usr/hdp/2.3.4.0-3485/hive/lib/guava-14.0.1.jar,/usr/hdp/2.3.4.0-3485/hbase/lib/hbase-client.jar,/usr/hdp/2.3.4.0-3485/hbase/lib/hbase-common.jar,/usr/hdp/2.3.4.0-3485/hbase/lib/hbase-protocol.jar,/usr/hdp/2.3.4.0-3485/hbase/lib/hbase-server.jar,/usr/hdp/2.3.4.0-3485/hive/lib/hive-hbase-handler.jar,/usr/hdp/2.3.4.0-3485/hive/lib/htrace-core-3.1.0-incubating.jar,/usr/hdp/2.3.4.0-3485/hive/lib/zookeeper-3.4.6.2.3.4.0-3485.jar,/usr/hdp/2.3.4.0-3485/hive/lib/hive-hbase-handler.jar,/usr/hdp/2.3.4.0-3485/hbase/lib/hbase-server.jar
--packages com.databricks:spark-csv_2.10:1.3.0  --master yarn-client -i
/TestDivya/Spark/InstrumentCopyToHDFSHive.scala

*Stack Trace* :

Stack SQL context available as sqlContext.
> Loading /TestDivya/Spark/InstrumentCopyToHDFSHive.scala...
> import org.apache.spark._
> import org.apache.spark.sql._
> 16/02/29 23:09:29 INFO HiveContext: Initializing execution hive, version
> 1.2.1
> 16/02/29 23:09:29 INFO ClientWrapper: Inspected Hadoop version:
> 2.7.1.2.3.4.0-3485
> 16/02/29 23:09:29 INFO ClientWrapper: Loaded
> org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version
> 2.7.1.2.3.4.0-3485
> 16/02/29 23:09:29 INFO HiveContext: default warehouse location is
> /user/hive/warehouse
> 16/02/29 23:09:29 INFO HiveContext: Initializing HiveMetastoreConnection
> version 1.2.1 using Spark classes.
> 16/02/29 23:09:29 INFO ClientWrapper: Inspected Hadoop version:
> 2.7.1.2.3.4.0-3485
> 16/02/29 23:09:29 INFO ClientWrapper: Loaded
> org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version
> 2.7.1.2.3.4.0-3485
> 16/02/29 23:09:30 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 16/02/29 23:09:30 INFO metastore: Trying to connect to metastore with URI
> thrift://ip-xxx-xx-xx-xxx.ap-southeast-1.compute.internal:9083
> 16/02/29 23:09:30 INFO metastore: Connected to metastore.
> 16/02/29 23:09:30 WARN DomainSocketFactory: The short-circuit local reads
> feature cannot be used because libhadoop cannot be loaded.
> 16/02/29 23:09:31 INFO SessionState: Created local directory:
> /tmp/1bf53785-f7c8-406d-a733-a5858ccb2d16_resources
> 16/02/29 23:09:31 INFO SessionState: Created HDFS directory:
> /tmp/hive/hdfs/1bf53785-f7c8-406d-a733-a5858ccb2d16
> 16/02/29 23:09:31 INFO SessionState: Created local directory:
> /tmp/hdfs/1bf53785-f7c8-406d-a733-a5858ccb2d16
> 16/02/29 23:09:31 INFO SessionState: Created HDFS directory:
> /tmp/hive/hdfs/1bf53785-f7c8-406d-a733-a5858ccb2d16/_tmp_space.db
> hiveContext: org.apache.spark.sql.hive.HiveContext =
> org.apache.spark.sql.hive.HiveContext@10b14f32
> 16/02/29 23:09:32 INFO ParseDriver: Parsing command: from TEST SELECT  NAME
> 16/02/29 23:09:32 INFO ParseDriver: Parse Completed
> 16/02/29 23:09:33 INFO deprecation: mapred.map.tasks is deprecated.
> Instead, use mapreduce.job.maps
> 16/02/29 23:09:33 INFO MemoryStore: ensureFreeSpace(468352) called with
> curMem=0, maxMem=556038881
> 16/02/29 23:09:33 INFO MemoryStore: Block broadcast_0 stored as values in
> memory (estimated size 457.4 KB, free 529.8 MB)
> 16/02/29 23:09:33 INFO MemoryStore: ensureFreeSpace(49454) called with
> curMem=468352, maxMem=556038881
> 16/02/29 23:09:33 INFO MemoryStore: Block broadcast_0_piece0 stored as
> bytes in memory (estimated size 48.3 KB, free 529.8 MB)
> 16/02/29 23:09:33 INFO BlockManagerInfo: Added broadcast_0_piece0 in
> memory on xxx.xx.xx.xxx:37784 (size: 48.3 KB, free: 530.2 MB)
> 16/02/29 23:09:33 INFO SparkContext: Created broadcast 0 from collect at
> :30
> 16/02/29 23:09:34 INFO HBaseStorageHandler: 

[Error]: Spark 1.5.2 + HiveHbase Integration

2016-02-29 Thread Divya Gehlot
Hi,
I am trying to access hive table which been created using HbaseIntegration

I am able to access data in Hive CLI
But when I am trying to access the table using hivecontext of Spark
getting following error

> java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/util/Bytes
> at
> org.apache.hadoop.hive.hbase.HBaseSerDe.parseColumnsMapping(HBaseSerDe.java:184)
> at
> org.apache.hadoop.hive.hbase.HBaseSerDeParameters.(HBaseSerDeParameters.java:73)
> at
> org.apache.hadoop.hive.hbase.HBaseSerDe.initialize(HBaseSerDe.java:117)
> at
> org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:53)
> at
> org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:521)
> at
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:391)
> at
> org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:276)
> at
> org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:258)
> at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:605)
> at
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1$$anonfun$3.apply(ClientWrapper.scala:330)
> at
> org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1$$anonfun$3.apply(ClientWrapper.scala:325)



Have added following jars to Spark class path :
/usr/hdp/2.3.4.0-3485/hive/lib/hive-hbase-handler.jar,
/usr/hdp/2.3.4.0-3485/hive/lib/zookeeper-3.4.6.2.3.4.0-3485.jar,
/usr/hdp/2.3.4.0-3485/hive/lib/guava-14.0.1.jar,
/usr/hdp/2.3.4.0-3485/hive/lib/protobuf-java-2.5.0.jar

Which jar files  am I missing ??


Thanks,
Regards,
Divya


[Query] : How to read null values in Spark 1.5.2

2016-02-24 Thread Divya Gehlot
Hi,
I have a data set(source is data -> database) which has null values .
When I am defining the custom schema as any type except string type,
I get number format exception on null values .
Has anybody come across this kind of scenario?
Would really appreciate if you can share your resolution or workaround.

Thanks,
Divya


Re: Need help :Does anybody has HDP cluster on EC2?

2016-02-15 Thread Divya Gehlot
Hi Sabarish,
Thanks alot for your help.
I am able to view the logs now

Thank you very much .

Cheers,
Divya


On 15 February 2016 at 16:51, Sabarish Sasidharan <
sabarish.sasidha...@manthan.com> wrote:

> You can setup SSH tunneling.
>
>
> http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-ssh-tunnel.html
>
> Regards
> Sab
>
> On Mon, Feb 15, 2016 at 1:55 PM, Divya Gehlot <divya.htco...@gmail.com>
> wrote:
>
>> Hi,
>> I have hadoop cluster set up in EC2.
>> I am unable to view application logs in Web UI as its taking internal IP
>> Like below :
>> http://ip-xxx-xx-xx-xxx.ap-southeast-1.compute.internal:8042
>> <http://ip-172-31-22-136.ap-southeast-1.compute.internal:8042/>
>>
>> How can I change this to external one or redirecting to external ?
>> Attached screenshots for better understanding of my issue.
>>
>> Would really appreciate help.
>>
>>
>> Thanks,
>> Divya
>>
>>
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>
>
>
> --
>
> Architect - Big Data
> Ph: +91 99805 99458
>
> Manthan Systems | *Company of the year - Analytics (2014 Frost and
> Sullivan India ICT)*
> +++
>


Need help :Does anybody has HDP cluster on EC2?

2016-02-15 Thread Divya Gehlot
Hi,
I have hadoop cluster set up in EC2.
I am unable to view application logs in Web UI as its taking internal IP
Like below :
http://ip-xxx-xx-xx-xxx.ap-southeast-1.compute.internal:8042


How can I change this to external one or redirecting to external ?
Attached screenshots for better understanding of my issue.

Would really appreciate help.


Thanks,
Divya

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Unable to view logs through Web UI + Amazon EC2

2016-01-18 Thread Divya Gehlot
Hi,
I have Hadoop cluster set up on Amazon EC2.
When I am trying to access the application logs through Web UI I am getting
page cant be displayed.
Configuration of Cluster :
My Namenode is mapped with elastic IP(static) of EC2.
Other datanodes public IPchanging everyday as we are stopping the
clustering during non working hours.

Observation :
When I try to view the logs ,Its picking one of the datanode private IP and
I am getting Page cant be diaplayed.

P.S. Attached the screen shot for your reference.

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org

Data Storage for Joins and ACID transactions + Hadoop Cluster

2016-01-17 Thread Divya Gehlot
Hi,
Which Data storage is best for multiple joins on the run time in Hadoop.
Tried Hive but performance is poor.
Pointers/Guidance appreciated.


Thanks,
Regards,
Divya


Pig Java UDF : ERROR 1066: Unable to open iterator for alias

2015-07-10 Thread Divya Gehlot
Pig Stack Trace
---
ERROR 1066: Unable to open iterator for alias C

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
open iterator for alias C
at org.apache.pig.PigServer.openIterator(PigServer.java:892)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:607)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
at org.apache.pig.PigServer.openIterator(PigServer.java:884)
... 13 more
  Application Log
---
Application application_1436453941326_0020 failed 2 times due to AM
Container for appattempt_1436453941326_0020_02 exited with exitCode: 1
For more detailed output, check application tracking page:
http://quickstart.cloudera:8088/proxy/application_1436453941326_0020/Then,
click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1436453941326_0020_02_01
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.


return tuple of key value pair from string in java UDF

2015-07-07 Thread Divya Gehlot
Hi,
I have a pig script which return field as
 Key=ValKey=ValKey=ValKey=ValKey=Val whose schema  is defined as
data:chararray in my pig script
Can I use java UDF to return a tuple of Key Value Pair.

Would really appreciate if somebody help me point to some guide or example
of this kind of scenario.


Thanks


how to write custom log loader and store in JSON format

2015-07-03 Thread Divya Gehlot
Hi,
I am new to pig and I have a log file in below format
(Message,NIL,2015-07-01,22:58:53.66,E,xx.xxx.x.xxx,12,0xd6,BIZ,Componentname,0,0.0,key_1=valueKEY_2=KEY_3=VALUEKEY_4=AUKEY_5=COMPANYKEY_6=VALUEKEY_7=1222KEY_8=VALUEKEY_9=VALUEKEY_10=VALUEKEY_10=VALUE)


for which I need to write pig script and store in below JSON format
{Message1:Message,date:2015-07-01,Time:22:58:53.66,E:E,machine
:xx.xxx.x.xxx,data:{key_1:value,key_2:value,key_3:value,key_3:value,key_3:value,key_5:value.}
}

Can somebody help me in writing custom loader .

would really appreciate your help.

thanks,


Re: copy data from one hadoop cluster to another hadoop cluster + cant use distcp

2015-06-23 Thread Divya Gehlot
Can you please elaborate it more.
On 20 Jun 2015 2:46 pm, SF Hadoop sfhad...@gmail.com wrote:

 Really depends on your requirements for the format of the data.

 The easiest way I can think of is to stream batches of data into a pub
 sub system that the target system can access and then consume.

 Verify each batch and then ditch them.

 You can throttle the size of the intermediary infrastructure based on your
 batches.

 Seems the most efficient approach.

 On Thursday, June 18, 2015, Divya Gehlot divya.htco...@gmail.com wrote:

 Hi,
 I need to copy data from first hadoop cluster to second hadoop cluster.
 I cant access second hadoop cluster from first hadoop cluster due to some
 security issue.
 Can any point me how can I do apart from distcp command.
 For instance
 Cluster 1 secured zone - copy hdfs data  to - cluster 2 in non secured
 zone



 Thanks,
 Divya





Re: copy data from one hadoop cluster to another hadoop cluster + cant use distcp

2015-06-19 Thread Divya Gehlot
In thats It will be like three step process .
1. first cluster (secure zone) HDFS  - copytoLocal - user local file
system
2. user local space - copy data - second cluster user local file system
3. second cluster user local file system - copyfromlocal - second
clusterHDFS

Am I on the right track ?



On 19 June 2015 at 12:38, Nitin Pawar nitinpawar...@gmail.com wrote:

 What's the size of the data?
 If you can not do distcp between clusters then other way is doing hdfs get
 on the data and then hdfs put on another cluster
 On 19-Jun-2015 9:56 am, Divya Gehlot divya.htco...@gmail.com wrote:

 Hi,
 I need to copy data from first hadoop cluster to second hadoop cluster.
 I cant access second hadoop cluster from first hadoop cluster due to some
 security issue.
 Can any point me how can I do apart from distcp command.
 For instance
 Cluster 1 secured zone - copy hdfs data  to - cluster 2 in non secured
 zone



 Thanks,
 Divya





copy data from one hadoop cluster to another hadoop cluster + cant use distcp

2015-06-18 Thread Divya Gehlot
Hi,
I need to copy data from first hadoop cluster to second hadoop cluster.
I cant access second hadoop cluster from first hadoop cluster due to some
security issue.
Can any point me how can I do apart from distcp command.
For instance
Cluster 1 secured zone - copy hdfs data  to - cluster 2 in non secured
zone



Thanks,
Divya