eption: Can't get Master
Kerberos principal for use as renewer
sc.textFile("hdfs://vm1.comp.com:8020/user/myusr/temp/file1").collect().foreach(println)
//Getting this error: java.io.IOException: Can't get Master
Kerberos principal for use as renewer
}
}
On Mon,
Did anyone use
https://www.codatlas.com/github.com/apache/spark/HEAD/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
to interact with secured Hadoop from Spark ?
Thanks,
Ajay
On Mon, Nov 7, 2016 at 4:37 PM, Ajay Chander wrote:
>
> Hi Everyone,
>
> I am trying
Hi Everyone,
I am trying to develop a simple codebase on my machine to read data from
secured Hadoop cluster. We have a development cluster which is secured
through Kerberos and I want to run a Spark job from my IntelliJ to read
some sample data from the cluster. Has anyone done this before ? Can
ely.
>
> The NPE you see is an unrelated cosmetic problem that was fixed in 2.0.1
> IIRC.
>
> On Wed, Oct 26, 2016 at 4:28 AM Ajay Chander > wrote:
>
>> Hi Everyone,
>>
>> I was thinking if I can use hiveContext inside foreach like below,
>>
>> o
ame in main, you can register it as a table
> and run the queries in main method itself. You don't need to coalesce or
> run the method within foreach.
>
> Regards
> Sunita
>
> On Tuesday, October 25, 2016, Ajay Chander wrote:
>
>>
>> Jeff, Thanks for y
e, you can use hiveContext in the foreach as it is scala
> List foreach operation which runs in driver side. But you cannot use
> hiveContext in RDD.foreach
>
>
>
> Ajay Chander 于2016年10月26日周三 上午11:28写道:
>
>> Hi Everyone,
>>
>> I was thinking if I can use hiveCo
Hi Everyone,
I was thinking if I can use hiveContext inside foreach like below,
object Test {
def main(args: Array[String]): Unit = {
val conf = new SparkConf()
val sc = new SparkContext(conf)
val hiveContext = new HiveContext(sc)
val dataElementsFile = args(0)
val deDF =
ds_nm", "cyc_dt").mode("Append"
).insertInto("devl_df2_spf_batch.spf_supplier_trans_metric_detl_base_1")
}
}
}
This is my cluster( Spark 1.6.0 on Yarn, Cloudera 5.7.1) configuration,
Memory -> 4.10 TB
VCores -> 544
I am deploying the application in yarn
Hi Everyone,
Can anyone tell me if there is anything wrong with my code flow below ?
Based on each element from the text file I would like to run a query
against Hive table and persist results in another Hive table. I want to do
this in parallel for each element in the file. I appreciate any of yo
| Zander| Turtle| Frog|
> | 7|Dogs| Sparrow|Goldfish|NULL|Salamander|
> +---++------+++--+
>
> >>> cnr = sqlContext.sql(sql)
> >>> cnr.show()
> +---+---+-+++--+
> | id|Mammals|Birds|Fish|Re
d api, so it will be read sequentially.
>>
>> Furthermore you are going to need create a schema if you want to use
>> dataframes.
>>
>> El 5/10/2016 1:53, "Ajay Chander" escribió:
>>
>>> Right now, I am doing it like below,
>>>
&g
Wed, Oct 5, 2016 at 12:42 AM, Daniel wrote:
>>
>>> First of all, if you want to read a txt file in Spark, you should use
>>> sc.textFile, because you are using "Source.fromFile", so you are reading it
>>> with Scala standard api, so it will be read s
e {
println("Animal Type: "+anmtyp+" has > 10 distinct values")
}
}
But the problem is it is running sequentially.
Any inputs are appreciated. Thank you.
Regards,
Ajay
On Tue, Oct 4, 2016 at 7:44 PM, Ajay Chander wrote:
> Hi Everyone,
>
> I have a us
Hi Everyone,
I have a use-case where I have two Dataframes like below,
1) First Dataframe(DF1) contains,
*ANIMALS*
Mammals
Birds
Fish
Reptiles
Amphibians
2) Second Dataframe(DF2) contains,
*ID, Mammals, Birds, Fish, Reptiles, Amphibians*
1, Dogs, Eagle, Goldfish,
Hi Everyone,
First of all let me explain you what I am trying to do and I apologize for
writing a lengthy mail.
1) Pragmatically connect to remote secured(Kerberized) Hadoop cluster(CDH
5.7) from my local machine.
- Once connected, I want to read the data from remote Hive table into
Spark
t;> // maropu
>>>>>>
>>>>>>
>>>>>> On Sun, Sep 11, 2016 at 12:37 AM, Mich Talebzadeh <
>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>
>>>>>>> Strange that Oracle table of
Hello Everyone,
My goal is to use Spark Sql to load huge amount of data from Oracle to HDFS.
*Table in Oracle:*
1) no primary key.
2) Has 404 columns.
3) Has 200,800,000 rows.
*Spark SQL:*
In my Spark SQL I want to read the data into n number of partitions in
parallel, for which I need to provid
t;
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
Hi Mich,
Right now I have a similar usecase where I have to delete some rows from a
hive table. My hive table is of type ORC, Bucketed and included
transactional property. I can delete from hive shell but not from my
spark-shell or spark app. Were you able to find any work around? Thank you.
Rega
> As a workaround you can write the select statement yourself instead of just
> providing the table name.
>
> On Jun 11, 2016, at 6:27 PM, Ajay Chander wrote:
>
> I tried implementing the same functionality through Scala as well. But no
> luck so far. Just wondering if anyone
I tried implementing the same functionality through Scala as well. But no
luck so far. Just wondering if anyone here tried using Spark SQL to read
SAS dataset? Thank you
Regards,
Ajay
On Friday, June 10, 2016, Ajay Chander wrote:
> Mich, I completely agree with you. I built another Spark
ID
> , CLUSTERED
> , SCATTERED
> , RANDOMISED
> , RANDOM_STRING
> , SMALL_VC
> , PADDING
> FROM tmp
> """
>HiveContext.sql(sqltext)
> println ("\nFinished at"); sqlContext.sql("SELE
28017
> 18 10419
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
&g
Hi again, anyone in this group tried to access SAS dataset through Spark
SQL ? Thank you
Regards,
Ajay
On Friday, June 10, 2016, Ajay Chander wrote:
> Hi Spark Users,
>
> I hope everyone here are doing great.
>
> I am trying to read data from SAS through Spark SQL and
Hi Spark Users,
I hope everyone here are doing great.
I am trying to read data from SAS through Spark SQL and write into HDFS.
Initially, I started with pure java program please find the program and
logs in the attached file sas_pure_java.txt . My program ran successfully
and it returned the data
; But you can maintain a file e.g. extractRange.conf in hdfs , to read from
> it the end range and update it with new end range from spark job before it
> finishes with the new relevant ranges to be used next time.
>
> On Tue, Jun 7, 2016 at 8:49 PM, Ajay Chander > wrote:
>
>>
y on your hd
>>> 2. use spark-streanming to read data from that directory and store it
>>> into hdfs
>>>
>>> perhaps there is some sort of spark 'connectors' that allows you to read
>>> data from a db directly so you dont need to go vi
ur hd
>> 2. use spark-streanming to read data from that directory and store it
>> into hdfs
>>
>> perhaps there is some sort of spark 'connectors' that allows you to read
>> data from a db directly so you dont need to go via spk streaming?
>>
>>
Hi Spark users,
Right now we are using spark for everything(loading the data from
sqlserver, apply transformations, save it as permanent tables in hive) in
our environment. Everything is being done in one spark application.
The only thing we do before we launch our spark application through
oozie
Hi Vikash,
These are my thoughts, read the input directory using wholeTextFiles()
which would give a paired RDD with key as file name and value as file
content. Then you can apply a map function to read each line and append key
to the content.
Thank you,
Aj
On Tuesday, May 31, 2016, Vikash Kumar
Hi Everyone, Any insights on this thread? Thank you.
On Friday, May 27, 2016, Ajay Chander wrote:
> Hi Everyone,
>
>I have some data located on the EdgeNode. Right
> now, the process I follow to copy the data from Edgenode to HDFS is through
> a sh
Hi Everyone,
I have some data located on the EdgeNode. Right
now, the process I follow to copy the data from Edgenode to HDFS is through
a shellscript which resides on Edgenode. In Oozie I am using a SSH action
to execute the shell script on Edgenode which copies the dat
wn where the issue is ?
>
>
> Sent from my iPhone
>
> On May 23, 2016, at 5:26 PM, Ajay Chander > wrote:
>
> I downloaded the spark 1.5 untilities and exported SPARK_HOME pointing to
> it. I copied all the cluster configuration files(hive-site.xml,
> hdfs-site.xml etc
gards,
Aj
On Monday, May 23, 2016, Ajay Chander wrote:
> Hi Everyone,
>
> I am building a Java Spark application in eclipse IDE. From my application
> I want to use hiveContext to read tables from the remote Hive(Hadoop
> cluster). On my machine I have exported $HADOOP_CONF_DIR =
Hi Everyone,
I am building a Java Spark application in eclipse IDE. From my application
I want to use hiveContext to read tables from the remote Hive(Hadoop
cluster). On my machine I have exported $HADOOP_CONF_DIR =
{$HOME}/hadoop/conf/. This path has all the remote cluster conf details
like hive-
Never mind! I figured it out by saving it as hadoopfile and passing the
codec to it. Thank you!
On Tuesday, May 10, 2016, Ajay Chander wrote:
> Hi, I have a folder temp1 in hdfs which have multiple format files
> test1.txt, test2.avsc (Avro file) in it. Now I want to compress these
it. Is there any possible/effiencient way to achieve this?
Thanks,
Aj
On Tuesday, May 10, 2016, Ajay Chander wrote:
> I will try that out. Thank you!
>
> On Tuesday, May 10, 2016, Deepak Sharma > wrote:
>
>> Yes that's what I intended to say.
>>
>> Thank
Hi Ajay
> You can look at wholeTextFiles method of rdd[string,string] and then map
> each of rdd to saveAsTextFile .
> This will serve the purpose .
> I don't think if anything default like distcp exists in spark
>
> Thanks
> Deepak
> On 10 May 2016 11:27 pm, "Aja
I will try that out. Thank you!
On Tuesday, May 10, 2016, Deepak Sharma wrote:
> Yes that's what I intended to say.
>
> Thanks
> Deepak
> On 10 May 2016 11:47 pm, "Ajay Chander" > wrote:
>
>> Hi Deepak,
>>Thanks for your response. If I
Hi Everyone,
we are planning to migrate the data between 2 clusters and I see distcp
doesn't support data compression. Is there any efficient way to compress
the data during the migration ? Can I implement any spark job to do this ?
Thanks.
Mich,
Can you try the value for paymentdata to this
format paymentdata='2015-01-01 23:59:59' , to_date(paymentdate) and see if
it helps.
On Thursday, March 24, 2016, Tamas Szuromi
wrote:
> Hi Mich,
>
> Take a look
> https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/sql/functions.ht
Hi Everyone, a quick question with in this context. What is the underneath
persistent storage that you guys are using? With regards to this
containerized environment? Thanks
On Thursday, March 10, 2016, yanlin wang wrote:
> How you guys make driver docker within container to be reachable from
>
Hi Ashok,
Try using hivecontext instead of sqlcontext. I suspect sqlcontext doesnot
have that functionality. Let me know if it works.
Thanks,
Ajay
On Friday, March 4, 2016, ashokkumar rajendran <
ashokkumar.rajend...@gmail.com> wrote:
> Hi Ayan,
>
> Thanks for the response. I am using SQL query
k dependencies. The link I mentioned before is the one
> you could follow, please read my previous mail.
>
> Thanks
> Saisai
>
>
>
> On Thu, Oct 22, 2015 at 1:56 AM, Ajay Chander wrote:
>
>> Thanks for your kind inputs. Right now I am running spark-1.3.1 on YARN(4
&
Hi Everyone,
I have a use case where I have to create a DataFrame inside the map()
function. To create a DataFrame it need sqlContext or hiveContext. Now how
do I pass the context to my map function ? And I am doing it in java. I
tried creating a class "TestClass" which implements "Function"
and i
>> FB: http://www.facebook.com/meruvian
>> TW: http://www.twitter.com/meruvian / @meruvian
>> Website: http://www.meruvian.org
>>
>> "We grow because we share the same belief."
>>
>>
>> On Wed, Oct 21, 2015 at 12:24 PM, Doug Balog > > wr
Hi Everyone,
Any one has any idea if spark-1.5.1 is available as a service on
HortonWorks ? I have spark-1.3.1 installed on the Cluster and it is a
HortonWorks distribution. Now I want upgrade it to spark-1.5.1. Anyone here
have any idea about it? Thank you in advance.
Regards,
Ajay
Hi Jacin,
If I was you, first thing that I would do is, write a sample java
application to write data into hdfs and see if it's working fine. Meta data
is being created in hdfs, that means, communication to namenode is working
fine but not to datanodes since you don't see any data inside the file.
4/08/22/spark-submit-in-java/>.
>>
>> I heard rumors that there are some work going on to prepare Submit API,
>> but I am not a contributor and I can't say neither if it is true nor how
>> are the works going on.
>> For now the suggested way is to use the provi
Mitchell wrote:
> Hi Ajay,
>
> Are you trying to save to your local file system or to HDFS?
>
> // This would save to HDFS under "/user/hadoop/counter"
> counter.saveAsTextFile("/user/hadoop/counter");
>
> David
>
>
> On Sun, Aug 30, 2015
Hi Everyone,
Recently we have installed spark on yarn in hortonworks cluster. Now I am
trying to run a wordcount program in my eclipse and I
did setMaster("local") and I see the results that's as expected. Now I want
to submit the same job to my yarn cluster from my eclipse. In storm
basically I w
51 matches
Mail list logo