from:"Ajay Chander"

Re: Access_Remote_Kerberized_Cluster_Through_Spark

2016-11-09 Thread Ajay Chander

eption: Can't get Master Kerberos principal for use as renewer sc.textFile("hdfs://vm1.comp.com:8020/user/myusr/temp/file1").collect().foreach(println) //Getting this error: java.io.IOException: Can't get Master Kerberos principal for use as renewer } } On Mon,

Re: Access_Remote_Kerberized_Cluster_Through_Spark

2016-11-07 Thread Ajay Chander

Did anyone use https://www.codatlas.com/github.com/apache/spark/HEAD/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala to interact with secured Hadoop from Spark ? Thanks, Ajay On Mon, Nov 7, 2016 at 4:37 PM, Ajay Chander wrote: > > Hi Everyone, > > I am trying

Access_Remote_Kerberized_Cluster_Through_Spark

2016-11-07 Thread Ajay Chander

Hi Everyone, I am trying to develop a simple codebase on my machine to read data from secured Hadoop cluster. We have a development cluster which is secured through Kerberos and I want to run a Spark job from my IntelliJ to read some sample data from the cluster. Has anyone done this before ? Can

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander

ely. > > The NPE you see is an unrelated cosmetic problem that was fixed in 2.0.1 > IIRC. > > On Wed, Oct 26, 2016 at 4:28 AM Ajay Chander > wrote: > >> Hi Everyone, >> >> I was thinking if I can use hiveContext inside foreach like below, >> >> o

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander

ame in main, you can register it as a table > and run the queries in main method itself. You don't need to coalesce or > run the method within foreach. > > Regards > Sunita > > On Tuesday, October 25, 2016, Ajay Chander wrote: > >> >> Jeff, Thanks for y

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander

e, you can use hiveContext in the foreach as it is scala > List foreach operation which runs in driver side. But you cannot use > hiveContext in RDD.foreach > > > > Ajay Chander 于2016年10月26日周三上午11:28写道： > >> Hi Everyone, >> >> I was thinking if I can use hiveCo

HiveContext is Serialized?

2016-10-25 Thread Ajay Chander

Hi Everyone, I was thinking if I can use hiveContext inside foreach like below, object Test { def main(args: Array[String]): Unit = { val conf = new SparkConf() val sc = new SparkContext(conf) val hiveContext = new HiveContext(sc) val dataElementsFile = args(0) val deDF =

Re: Code review / sqlContext Scope

2016-10-19 Thread Ajay Chander

ds_nm", "cyc_dt").mode("Append" ).insertInto("devl_df2_spf_batch.spf_supplier_trans_metric_detl_base_1") } } } This is my cluster( Spark 1.6.0 on Yarn, Cloudera 5.7.1) configuration, Memory -> 4.10 TB VCores -> 544 I am deploying the application in yarn

Code review / sqlContext Scope

2016-10-08 Thread Ajay Chander

Hi Everyone, Can anyone tell me if there is anything wrong with my code flow below ? Based on each element from the text file I would like to run a query against Hive table and persist results in another Hive table. I want to do this in parallel for each element in the file. I appreciate any of yo

UseCase_Design_Help

2016-10-05 Thread Ajay Chander

| Zander| Turtle| Frog| > | 7|Dogs| Sparrow|Goldfish|NULL|Salamander| > +---++------+++--+ > > >>> cnr = sqlContext.sql(sql) > >>> cnr.show() > +---+---+-+++--+ > | id|Mammals|Birds|Fish|Re

Re: UseCase_Design_Help

2016-10-05 Thread Ajay Chander

d api, so it will be read sequentially. >> >> Furthermore you are going to need create a schema if you want to use >> dataframes. >> >> El 5/10/2016 1:53, "Ajay Chander" escribió: >> >>> Right now, I am doing it like below, >>> &g

Re: UseCase_Design_Help

2016-10-05 Thread Ajay Chander

Wed, Oct 5, 2016 at 12:42 AM, Daniel wrote: >> >>> First of all, if you want to read a txt file in Spark, you should use >>> sc.textFile, because you are using "Source.fromFile", so you are reading it >>> with Scala standard api, so it will be read s

Re: UseCase_Design_Help

2016-10-04 Thread Ajay Chander

e { println("Animal Type: "+anmtyp+" has > 10 distinct values") } } But the problem is it is running sequentially. Any inputs are appreciated. Thank you. Regards, Ajay On Tue, Oct 4, 2016 at 7:44 PM, Ajay Chander wrote: > Hi Everyone, > > I have a us

UseCase_Design_Help

2016-10-04 Thread Ajay Chander

Hi Everyone, I have a use-case where I have two Dataframes like below, 1) First Dataframe(DF1) contains, *ANIMALS* Mammals Birds Fish Reptiles Amphibians 2) Second Dataframe(DF2) contains, *ID, Mammals, Birds, Fish, Reptiles, Amphibians* 1, Dogs, Eagle, Goldfish,

Spark_Jdbc_Hive

2016-10-03 Thread Ajay Chander

Hi Everyone, First of all let me explain you what I am trying to do and I apologize for writing a lengthy mail. 1) Pragmatically connect to remote secured(Kerberized) Hadoop cluster(CDH 5.7) from my local machine. - Once connected, I want to read the data from remote Hive table into Spark

Re: Spark_JDBC_Partitions

2016-09-19 Thread Ajay Chander

t;> // maropu >>>>>> >>>>>> >>>>>> On Sun, Sep 11, 2016 at 12:37 AM, Mich Talebzadeh < >>>>>> mich.talebza...@gmail.com> wrote: >>>>>> >>>>>>> Strange that Oracle table of

Spark_JDBC_Partitions

2016-09-10 Thread Ajay Chander

Hello Everyone, My goal is to use Spark Sql to load huge amount of data from Oracle to HDFS. *Table in Oracle:* 1) no primary key. 2) Has 404 columns. 3) Has 200,800,000 rows. *Spark SQL:* In my Spark SQL I want to read the data into n number of partitions in parallel, for which I need to provid

Re: Spark support for update/delete operations on Hive ORC transactional tables

2016-06-22 Thread Ajay Chander

t; > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > >

Re: Spark support for update/delete operations on Hive ORC transactional tables

2016-06-22 Thread Ajay Chander

Hi Mich, Right now I have a similar usecase where I have to delete some rows from a hive table. My hive table is of type ORC, Bucketed and included transactional property. I can delete from hive shell but not from my spark-shell or spark app. Were you able to find any work around? Thank you. Rega

SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-13 Thread Ajay Chander

> As a workaround you can write the select statement yourself instead of just > providing the table name. > > On Jun 11, 2016, at 6:27 PM, Ajay Chander wrote: > > I tried implementing the same functionality through Scala as well. But no > luck so far. Just wondering if anyone

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-11 Thread Ajay Chander

I tried implementing the same functionality through Scala as well. But no luck so far. Just wondering if anyone here tried using Spark SQL to read SAS dataset? Thank you Regards, Ajay On Friday, June 10, 2016, Ajay Chander wrote: > Mich, I completely agree with you. I built another Spark

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-10 Thread Ajay Chander

ID > , CLUSTERED > , SCATTERED > , RANDOMISED > , RANDOM_STRING > , SMALL_VC > , PADDING > FROM tmp > """ >HiveContext.sql(sqltext) > println ("\nFinished at"); sqlContext.sql("SELE

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-10 Thread Ajay Chander

28017 > 18 10419 > > HTH > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > &g

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-10 Thread Ajay Chander

Hi again, anyone in this group tried to access SAS dataset through Spark SQL ? Thank you Regards, Ajay On Friday, June 10, 2016, Ajay Chander wrote: > Hi Spark Users, > > I hope everyone here are doing great. > > I am trying to read data from SAS through Spark SQL and

SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-10 Thread Ajay Chander

Hi Spark Users, I hope everyone here are doing great. I am trying to read data from SAS through Spark SQL and write into HDFS. Initially, I started with pure java program please find the program and logs in the attached file sas_pure_java.txt . My program ran successfully and it returned the data

Re: Spark_Usecase

2016-06-07 Thread Ajay Chander

; But you can maintain a file e.g. extractRange.conf in hdfs , to read from > it the end range and update it with new end range from spark job before it > finishes with the new relevant ranges to be used next time. > > On Tue, Jun 7, 2016 at 8:49 PM, Ajay Chander > wrote: > >>

Re: Spark_Usecase

2016-06-07 Thread Ajay Chander

y on your hd >>> 2. use spark-streanming to read data from that directory and store it >>> into hdfs >>> >>> perhaps there is some sort of spark 'connectors' that allows you to read >>> data from a db directly so you dont need to go vi

Re: Spark_Usecase

2016-06-07 Thread Ajay Chander

ur hd >> 2. use spark-streanming to read data from that directory and store it >> into hdfs >> >> perhaps there is some sort of spark 'connectors' that allows you to read >> data from a db directly so you dont need to go via spk streaming? >> >>

Spark_Usecase

2016-06-07 Thread Ajay Chander

Hi Spark users, Right now we are using spark for everything(loading the data from sqlserver, apply transformations, save it as permanent tables in hive) in our environment. Everything is being done in one spark application. The only thing we do before we launch our spark application through oozie

Re: how to get file name of record being reading in spark

2016-05-31 Thread Ajay Chander

Hi Vikash, These are my thoughts, read the input directory using wholeTextFiles() which would give a paired RDD with key as file name and value as file content. Then you can apply a map function to read each line and append key to the content. Thank you, Aj On Tuesday, May 31, 2016, Vikash Kumar

Re: Spark_API_Copy_From_Edgenode

2016-05-28 Thread Ajay Chander

Hi Everyone, Any insights on this thread? Thank you. On Friday, May 27, 2016, Ajay Chander wrote: > Hi Everyone, > >I have some data located on the EdgeNode. Right > now, the process I follow to copy the data from Edgenode to HDFS is through > a sh

Spark_API_Copy_From_Edgenode

2016-05-27 Thread Ajay Chander

Hi Everyone, I have some data located on the EdgeNode. Right now, the process I follow to copy the data from Edgenode to HDFS is through a shellscript which resides on Edgenode. In Oozie I am using a SSH action to execute the shell script on Edgenode which copies the dat

Re: Hive_context

2016-05-24 Thread Ajay Chander

wn where the issue is ? > > > Sent from my iPhone > > On May 23, 2016, at 5:26 PM, Ajay Chander > wrote: > > I downloaded the spark 1.5 untilities and exported SPARK_HOME pointing to > it. I copied all the cluster configuration files(hive-site.xml, > hdfs-site.xml etc

Re: Hive_context

2016-05-23 Thread Ajay Chander

gards, Aj On Monday, May 23, 2016, Ajay Chander wrote: > Hi Everyone, > > I am building a Java Spark application in eclipse IDE. From my application > I want to use hiveContext to read tables from the remote Hive(Hadoop > cluster). On my machine I have exported $HADOOP_CONF_DIR =

Hive_context

2016-05-23 Thread Ajay Chander

Hi Everyone, I am building a Java Spark application in eclipse IDE. From my application I want to use hiveContext to read tables from the remote Hive(Hadoop cluster). On my machine I have exported $HADOOP_CONF_DIR = {$HOME}/hadoop/conf/. This path has all the remote cluster conf details like hive-

Re: Cluster Migration

2016-05-10 Thread Ajay Chander

Never mind! I figured it out by saving it as hadoopfile and passing the codec to it. Thank you! On Tuesday, May 10, 2016, Ajay Chander wrote: > Hi, I have a folder temp1 in hdfs which have multiple format files > test1.txt, test2.avsc (Avro file) in it. Now I want to compress these

Re: Cluster Migration

2016-05-10 Thread Ajay Chander

it. Is there any possible/effiencient way to achieve this? Thanks, Aj On Tuesday, May 10, 2016, Ajay Chander wrote: > I will try that out. Thank you! > > On Tuesday, May 10, 2016, Deepak Sharma > wrote: > >> Yes that's what I intended to say. >> >> Thank

Re: Cluster Migration

2016-05-10 Thread Ajay Chander

Hi Ajay > You can look at wholeTextFiles method of rdd[string,string] and then map > each of rdd to saveAsTextFile . > This will serve the purpose . > I don't think if anything default like distcp exists in spark > > Thanks > Deepak > On 10 May 2016 11:27 pm, "Aja

Re: Cluster Migration

2016-05-10 Thread Ajay Chander

I will try that out. Thank you! On Tuesday, May 10, 2016, Deepak Sharma wrote: > Yes that's what I intended to say. > > Thanks > Deepak > On 10 May 2016 11:47 pm, "Ajay Chander" > wrote: > >> Hi Deepak, >>Thanks for your response. If I

Cluster Migration

2016-05-10 Thread Ajay Chander

Hi Everyone, we are planning to migrate the data between 2 clusters and I see distcp doesn't support data compression. Is there any efficient way to compress the data during the migration ? Can I implement any spark job to do this ? Thanks.

Re: Converting a string of format of 'dd/MM/yyyy' in Spark sql

2016-03-24 Thread Ajay Chander

Mich, Can you try the value for paymentdata to this format paymentdata='2015-01-01 23:59:59' , to_date(paymentdate) and see if it helps. On Thursday, March 24, 2016, Tamas Szuromi wrote: > Hi Mich, > > Take a look > https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/sql/functions.ht

Re: Problem mixing MESOS Cluster Mode and Docker task execution

2016-03-10 Thread Ajay Chander

Hi Everyone, a quick question with in this context. What is the underneath persistent storage that you guys are using? With regards to this containerized environment? Thanks On Thursday, March 10, 2016, yanlin wang wrote: > How you guys make driver docker within container to be reachable from >

Re: Facing issue with floor function in spark SQL query

2016-03-04 Thread Ajay Chander

Hi Ashok, Try using hivecontext instead of sqlcontext. I suspect sqlcontext doesnot have that functionality. Let me know if it works. Thanks, Ajay On Friday, March 4, 2016, ashokkumar rajendran < ashokkumar.rajend...@gmail.com> wrote: > Hi Ayan, > > Thanks for the response. I am using SQL query

Spark_1.5.1_on_HortonWorks

2015-10-21 Thread Ajay Chander

k dependencies. The link I mentioned before is the one > you could follow, please read my previous mail. > > Thanks > Saisai > > > > On Thu, Oct 22, 2015 at 1:56 AM, Ajay Chander wrote: > >> Thanks for your kind inputs. Right now I am running spark-1.3.1 on YARN(4 &

Spark_sql

2015-10-21 Thread Ajay Chander

Hi Everyone, I have a use case where I have to create a DataFrame inside the map() function. To create a DataFrame it need sqlContext or hiveContext. Now how do I pass the context to my map function ? And I am doing it in java. I tried creating a class "TestClass" which implements "Function" and i

Re: Spark_1.5.1_on_HortonWorks

2015-10-21 Thread Ajay Chander

>> FB: http://www.facebook.com/meruvian >> TW: http://www.twitter.com/meruvian / @meruvian >> Website: http://www.meruvian.org >> >> "We grow because we share the same belief." >> >> >> On Wed, Oct 21, 2015 at 12:24 PM, Doug Balog > > wr

Spark_1.5.1_on_HortonWorks

2015-10-20 Thread Ajay Chander

Hi Everyone, Any one has any idea if spark-1.5.1 is available as a service on HortonWorks ? I have spark-1.3.1 installed on the Cluster and it is a HortonWorks distribution. Now I want upgrade it to spark-1.5.1. Anyone here have any idea about it? Thank you in advance. Regards, Ajay

Re: saveAsTextFile creates an empty folder in HDFS

2015-10-03 Thread Ajay Chander

Hi Jacin, If I was you, first thing that I would do is, write a sample java application to write data into hdfs and see if it's working fine. Meta data is being created in hdfs, that means, communication to namenode is working fine but not to datanodes since you don't see any data inside the file.

Re: submit_spark_job_to_YARN

2015-08-30 Thread Ajay Chander

4/08/22/spark-submit-in-java/>. >> >> I heard rumors that there are some work going on to prepare Submit API, >> but I am not a contributor and I can't say neither if it is true nor how >> are the works going on. >> For now the suggested way is to use the provi

Re: submit_spark_job_to_YARN

2015-08-30 Thread Ajay Chander

Mitchell wrote: > Hi Ajay, > > Are you trying to save to your local file system or to HDFS? > > // This would save to HDFS under "/user/hadoop/counter" > counter.saveAsTextFile("/user/hadoop/counter"); > > David > > > On Sun, Aug 30, 2015

submit_spark_job_to_YARN

2015-08-30 Thread Ajay Chander

Hi Everyone, Recently we have installed spark on yarn in hortonworks cluster. Now I am trying to run a wordcount program in my eclipse and I did setMaster("local") and I see the results that's as expected. Now I want to submit the same job to my yarn cluster from my eclipse. In storm basically I w

51 matches

Mail list logo