Re: Hive query execution from Spark(through HiveContext) failing with Apache Sentry

2015-06-17 Thread Ajay
Hi there! It seems like you have Read/Execute access permission (and no update/insert/delete access). What operation are you performing? Ajay > On Jun 17, 2015, at 5:24 PM, nitinkak001 wrote: > > I am trying to run a hive query from Spark code using HiveContext object. It > was

Re: send transformed RDD to s3 from slaves

2015-11-14 Thread Ajay
Hi Walrus, Try caching the results just before calling the rdd.count. Regards, Ajay > On Nov 13, 2015, at 7:56 PM, Walrus theCat wrote: > > Hi, > > I have an RDD which crashes the driver when being collected. I want to send > the data on its partitions out to S3 without b

PySpark Nested Json Parsing

2015-07-20 Thread Ajay
cheduler.scala:696) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) What I am doing wrong. Please guide. *Ajay Dubey*

Re: UDTF registration fails for hiveEnabled SQLContext

2018-05-15 Thread Ajay
--- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Thanks, Ajay

Re: Does Spark shows logical or physical plan when executing job on the yarn cluster

2018-05-20 Thread Ajay
You can look at the spark master UI at port 4040. It should tell you all the currently running stages as well as past/future stages. On Sun, May 20, 2018, 12:22 AM giri ar wrote: > Hi, > > > Good Day. > > Could you please let me know whether we can see spark logical or physical > plan while runn

Re: Bulk / Fast Read and Write with MSSQL Server and Spark

2018-05-23 Thread Ajay
>> >> -- >> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >> >> ----- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> > -- Thanks, Ajay

Re: Blockmgr directories intermittently not being cleaned up

2018-05-30 Thread Ajay
ut I believe this only pertains to > standalone mode and we are using the mesos deployment mode. So I don't > think this flag actually does anything. > > > Thanks, > Jeff > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Thanks, Ajay

Clarifications on Spark

2014-12-04 Thread Ajay
Hadoop, HBase?. We may use Cassandra/MongoDb/CouchBase as well. 4) Is Spark supports RDBMS too?. We can have a single interface to pull out data from multiple data sources? 5) Any recommendations(not limited to usage of Spark) for our specific requirement described above. Thanks Ajay Note : I have

Spark SQL Vs CQL performance on Cassandra

2014-12-11 Thread Ajay
a') 2) It takes around .6 second using Spark (either SELECT * FROM users WHERE name='Anna' or javaFunctions(sc).cassandraTable("test", "people", mapRowTo(Person.class)).where("name=?", "Anna"); Please let me know if I am missing something in

RDD for Storm Streaming in Spark

2014-12-23 Thread Ajay
Hi, Can we use Storm Streaming as RDD in Spark? Or any way to get Spark work with Storm? Thanks Ajay

Re: RDD for Storm Streaming in Spark

2014-12-23 Thread Ajay
Hi, The question is to do streaming in Spark with Storm (not using Spark Streaming). The idea is to use Spark as a in-memory computation engine and static data coming from Cassandra/Hbase and streaming data from Storm. Thanks Ajay On Tue, Dec 23, 2014 at 2:03 PM, Gerard Maas wrote: >

Re: RDD for Storm Streaming in Spark

2014-12-23 Thread Ajay
Right. I contacted the SummingBird users as well. It doesn't support Spark streaming currently. We are heading towards Storm as it is mostly widely used. Is Spark streaming production ready? Thanks Ajay On Tue, Dec 23, 2014 at 3:47 PM, Gerard Maas wrote: > I'm not aware of a p

Log4J 2 Support

2021-11-09 Thread Ajay Kumar
. Thanks in advance. Regards, Ajay

Unsubscribe

2022-04-28 Thread Ajay Thompson
Unsubscribe

Re: Facing issue with floor function in spark SQL query

2016-03-04 Thread Ajay Chander
Hi Ashok, Try using hivecontext instead of sqlcontext. I suspect sqlcontext doesnot have that functionality. Let me know if it works. Thanks, Ajay On Friday, March 4, 2016, ashokkumar rajendran < ashokkumar.rajend...@gmail.com> wrote: > Hi Ayan, > > Thanks for the response.

Re: Problem mixing MESOS Cluster Mode and Docker task execution

2016-03-10 Thread Ajay Chander
Hi Everyone, a quick question with in this context. What is the underneath persistent storage that you guys are using? With regards to this containerized environment? Thanks On Thursday, March 10, 2016, yanlin wang wrote: > How you guys make driver docker within container to be reachable from >

Re: Converting a string of format of 'dd/MM/yyyy' in Spark sql

2016-03-24 Thread Ajay Chander
Mich, Can you try the value for paymentdata to this format paymentdata='2015-01-01 23:59:59' , to_date(paymentdate) and see if it helps. On Thursday, March 24, 2016, Tamas Szuromi wrote: > Hi Mich, > > Take a look > https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/sql/functions.ht

Cluster Migration

2016-05-10 Thread Ajay Chander
Hi Everyone, we are planning to migrate the data between 2 clusters and I see distcp doesn't support data compression. Is there any efficient way to compress the data during the migration ? Can I implement any spark job to do this ? Thanks.

Re: Cluster Migration

2016-05-10 Thread Ajay Chander
I will try that out. Thank you! On Tuesday, May 10, 2016, Deepak Sharma wrote: > Yes that's what I intended to say. > > Thanks > Deepak > On 10 May 2016 11:47 pm, "Ajay Chander" > wrote: > >> Hi Deepak, >>Thanks for your response. If I

Re: Cluster Migration

2016-05-10 Thread Ajay Chander
Hi Deepak, Thanks for your response. If I am correct, you suggest reading all of those files into an rdd on the cluster using wholeTextFiles then apply compression codec on it, save the rdd to another Hadoop cluster? Thank you, Ajay On Tuesday, May 10, 2016, Deepak Sharma wrote: >

Re: Cluster Migration

2016-05-10 Thread Ajay Chander
it. Is there any possible/effiencient way to achieve this? Thanks, Aj On Tuesday, May 10, 2016, Ajay Chander wrote: > I will try that out. Thank you! > > On Tuesday, May 10, 2016, Deepak Sharma > wrote: > >> Yes that's what I intended to say. >> >> Thank

Re: Cluster Migration

2016-05-10 Thread Ajay Chander
Never mind! I figured it out by saving it as hadoopfile and passing the codec to it. Thank you! On Tuesday, May 10, 2016, Ajay Chander wrote: > Hi, I have a folder temp1 in hdfs which have multiple format files > test1.txt, test2.avsc (Avro file) in it. Now I want to compress these

Hive_context

2016-05-23 Thread Ajay Chander
Hi Everyone, I am building a Java Spark application in eclipse IDE. From my application I want to use hiveContext to read tables from the remote Hive(Hadoop cluster). On my machine I have exported $HADOOP_CONF_DIR = {$HOME}/hadoop/conf/. This path has all the remote cluster conf details like hive-

Re: Hive_context

2016-05-23 Thread Ajay Chander
gards, Aj On Monday, May 23, 2016, Ajay Chander wrote: > Hi Everyone, > > I am building a Java Spark application in eclipse IDE. From my application > I want to use hiveContext to read tables from the remote Hive(Hadoop > cluster). On my machine I have exported $HADOOP_CONF_DIR =

Re: Hive_context

2016-05-24 Thread Ajay Chander
wn where the issue is ? > > > Sent from my iPhone > > On May 23, 2016, at 5:26 PM, Ajay Chander > wrote: > > I downloaded the spark 1.5 untilities and exported SPARK_HOME pointing to > it. I copied all the cluster configuration files(hive-site.xml, > hdfs-site.xml etc

Spark_API_Copy_From_Edgenode

2016-05-27 Thread Ajay Chander
Hi Everyone, I have some data located on the EdgeNode. Right now, the process I follow to copy the data from Edgenode to HDFS is through a shellscript which resides on Edgenode. In Oozie I am using a SSH action to execute the shell script on Edgenode which copies the dat

Re: Spark_API_Copy_From_Edgenode

2016-05-28 Thread Ajay Chander
Hi Everyone, Any insights on this thread? Thank you. On Friday, May 27, 2016, Ajay Chander wrote: > Hi Everyone, > >I have some data located on the EdgeNode. Right > now, the process I follow to copy the data from Edgenode to HDFS is through > a sh

Re: how to get file name of record being reading in spark

2016-05-31 Thread Ajay Chander
Hi Vikash, These are my thoughts, read the input directory using wholeTextFiles() which would give a paired RDD with key as file name and value as file content. Then you can apply a map function to read each line and append key to the content. Thank you, Aj On Tuesday, May 31, 2016, Vikash Kumar

Spark_Usecase

2016-06-07 Thread Ajay Chander
Hi Spark users, Right now we are using spark for everything(loading the data from sqlserver, apply transformations, save it as permanent tables in hive) in our environment. Everything is being done in one spark application. The only thing we do before we launch our spark application through oozie

Re: Spark_Usecase

2016-06-07 Thread Ajay Chander
ur hd >> 2. use spark-streanming to read data from that directory and store it >> into hdfs >> >> perhaps there is some sort of spark 'connectors' that allows you to read >> data from a db directly so you dont need to go via spk streaming? >> >>

Re: Spark_Usecase

2016-06-07 Thread Ajay Chander
y on your hd >>> 2. use spark-streanming to read data from that directory and store it >>> into hdfs >>> >>> perhaps there is some sort of spark 'connectors' that allows you to read >>> data from a db directly so you dont need to go vi

Re: Spark_Usecase

2016-06-07 Thread Ajay Chander
; But you can maintain a file e.g. extractRange.conf in hdfs , to read from > it the end range and update it with new end range from spark job before it > finishes with the new relevant ranges to be used next time. > > On Tue, Jun 7, 2016 at 8:49 PM, Ajay Chander > wrote: > >>

SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-10 Thread Ajay Chander
_dt| |SR_NO|start_dt|end_dt| |SR_NO|start_dt|end_dt| |SR_NO|start_dt|end_dt| +-++--+ Since both programs are using the same driver com.sas.rio.MVADriver . Expected output should be same as my pure java programs output. But something else is happening behind the scenes. A

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-10 Thread Ajay Chander
Hi again, anyone in this group tried to access SAS dataset through Spark SQL ? Thank you Regards, Ajay On Friday, June 10, 2016, Ajay Chander wrote: > Hi Spark Users, > > I hope everyone here are doing great. > > I am trying to read data from SAS through Spark SQL and

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-10 Thread Ajay Chander
9:05,935] INFO ps(2.1)#executeQuery SELECT "SR_NO","start_dt","end_dt" FROM sasLib.run_control ; created result set 2.1.1; time= 0.102 secs (com.sas.rio.MVAStatement:590) Please find complete program and full logs attached in the below thread. Thank you. Regards, Ajay On Fr

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-10 Thread Ajay Chander
ID > , CLUSTERED > , SCATTERED > , RANDOMISED > , RANDOM_STRING > , SMALL_VC > , PADDING > FROM tmp > """ >HiveContext.sql(sqltext) > println ("\nFinished at"); sqlContext.sql("SELE

Re: SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-11 Thread Ajay Chander
I tried implementing the same functionality through Scala as well. But no luck so far. Just wondering if anyone here tried using Spark SQL to read SAS dataset? Thank you Regards, Ajay On Friday, June 10, 2016, Ajay Chander wrote: > Mich, I completely agree with you. I built another Spark

SAS_TO_SPARK_SQL_(Could be a Bug?)

2016-06-13 Thread Ajay Chander
providing the table name? Yes I did that too. It did not made any difference. Thank you, Ajay On Sunday, June 12, 2016, Mohit Jaggi wrote: > Looks like a bug in the code generating the SQL query…why would it be > specific to SAS, I can’t guess. Did you try the same with another database?

Re: Spark support for update/delete operations on Hive ORC transactional tables

2016-06-22 Thread Ajay Chander
. Regards, Ajay On Thursday, June 2, 2016, Mich Talebzadeh wrote: > thanks for that. > > I will have a look > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <http

Re: Spark support for update/delete operations on Hive ORC transactional tables

2016-06-22 Thread Ajay Chander
Thanks for the confirmation Mich! On Wednesday, June 22, 2016, Mich Talebzadeh wrote: > Hi Ajay, > > I am afraid for now transaction heart beat do not work through Spark, so I > have no other solution. > > This is interesting point as with Hive running on Spark engine there

SPARK-8813 - combining small files in spark sql

2016-07-06 Thread Ajay Srivastava
in spark 2.0 ? I did search commits done in 2.0 branch and looks like I need to use spark.sql.files.openCostInBytes but I am not sure. Regards,Ajay

Spark Streaming : Limiting number of receivers per executor

2016-02-10 Thread ajay garg
Hi All, I am running 3 executors in my spark streaming application with 3 cores per executors. I have written my custom receiver for receiving network data. In my current configuration I am launching 3 receivers , one receiver per executor. In the run if 2 of my executor dies, I am left

Spark UI documentaton needed

2016-02-22 Thread Ajay Gupta
Hi Sparklers, Can you guys give an elaborate documentation of Spark UI as there are many fields in it and we do not know much about it. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-documentaton-needed-tp26300.html Sent from the Apache Spark Use

Re: saveAsTextFile creates an empty folder in HDFS

2015-10-03 Thread Ajay Chander
e the file. Why don't you see hdfs logs and see what's happening when your application is talking to namenode? I suspect some networking issue or check if the datanodes are running fine. Thank you, Ajay On Saturday, October 3, 2015, Jacinto Arias wrote: > Yes printing the result

Spark_1.5.1_on_HortonWorks

2015-10-20 Thread Ajay Chander
Hi Everyone, Any one has any idea if spark-1.5.1 is available as a service on HortonWorks ? I have spark-1.3.1 installed on the Cluster and it is a HortonWorks distribution. Now I want upgrade it to spark-1.5.1. Anyone here have any idea about it? Thank you in advance. Regards, Ajay

Re: Spark_1.5.1_on_HortonWorks

2015-10-21 Thread Ajay Chander
that I can get it upgraded through Ambari UI ? If possible can anyone point me to a documentation online? Thank you. Regards, Ajay On Wednesday, October 21, 2015, Saisai Shao wrote: > Hi Frans, > > You could download Spark 1.5.1-hadoop 2.6 pre-built tarball and copy into > HDP 2.

Spark_sql

2015-10-21 Thread Ajay Chander
Hi Everyone, I have a use case where I have to create a DataFrame inside the map() function. To create a DataFrame it need sqlContext or hiveContext. Now how do I pass the context to my map function ? And I am doing it in java. I tried creating a class "TestClass" which implements "Function" and i

Spark_1.5.1_on_HortonWorks

2015-10-21 Thread Ajay Chander
main(HistoryServer.scala:231) at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala) I went to the lib folder and noticed that "spark-assembly-1.5.1-hadoop2.6.0.jar" is missing that class. I was able to get the spark history server started with 1.3.1 but not 1.5.1. Any inputs on this? Really appreciat

Re: ERROR SparkUI: Failed to bind SparkUI java.net.BindException: Address already in use: Service 'SparkUI' failed after 16 retries!

2015-07-24 Thread Ajay Singal
rs, e.g., 4040, 4041, 4042... Thanks, Ajay On Fri, Jul 24, 2015 at 6:21 AM, Joji John wrote: > *HI,* > > *I am getting this error for some of spark applications. I have multiple > spark applications running in parallel. Is there a limit in the number of > spark applications that I c

Re: Facing problem in Oracle VM Virtual Box

2015-07-24 Thread Ajay Singal
this helps. Ajay On Thu, Jul 23, 2015 at 6:40 AM, Chintan Bhatt < chintanbhatt...@charusat.ac.in> wrote: > Hi. > I'm facing following error while running .ova file containing Hortonworks > with Spark in Oracle VM Virtual Box: > > Failed to open a session for the vi

Re: ERROR SparkUI: Failed to bind SparkUI java.net.BindException: Address already in use: Service 'SparkUI' failed after 16 retries!

2015-07-24 Thread Ajay Singal
Hi Joji, To my knowledge, Spark does not offer any such function. I agree, defining a function to find an open (random) port would be a good option. However, in order to invoke the corresponding SparkUI one needs to know this port number. Thanks, Ajay On Fri, Jul 24, 2015 at 10:19 AM, Joji

Re: How to increase parallelism of a Spark cluster?

2015-08-03 Thread Ajay Singal
ity response, and if needed, I will open a JIRA item. I hope it helps. Regards, Ajay On Mon, Aug 3, 2015 at 1:16 PM, Sujit Pal wrote: > @Silvio: the mapPartitions instantiates a HttpSolrServer, then for each > query string in the partition, sends the query to Solr using SolrJ, and > gets

Re: Controlling number of executors on Mesos vs YARN

2015-08-13 Thread Ajay Singal
Hi Tim, An option like spark.mesos.executor.max to cap the number of executors per node/application would be very useful. However, having an option like spark.mesos.executor.num to specify desirable number of executors per node would provide even/much better control. Thanks, Ajay On Wed, Aug

Re: Controlling number of executors on Mesos vs YARN

2015-08-13 Thread Ajay Singal
s can specify desirable number of executors. If not available, Mesos (in a simple implementation) can provide/offer whatever is available. In a slightly complex implementation, we can build a simple protocol to negotiate. Regards, Ajay On Wed, Aug 12, 2015 at 5:51 PM, Tim Chen wrote: > You're

submit_spark_job_to_YARN

2015-08-30 Thread Ajay Chander
e. But somehow it's not happening. Please tell me if my assumption is wrong or if I am missing anything here. I have attached the word count program that I was using. Any help is highly appreciated. Thank you, Ajay submit_spark_job Description: Binary data -

Re: submit_spark_job_to_YARN

2015-08-30 Thread Ajay Chander
Hi David, Thanks for responding! My main intention was to submit spark Job/jar to yarn cluster from my eclipse with in the code. Is there any way that I could pass my yarn configuration somewhere in the code to submit the jar to the cluster? Thank you, Ajay On Sunday, August 30, 2015, David

Re: submit_spark_job_to_YARN

2015-08-30 Thread Ajay Chander
l.com > > wrote: > >> Hi Ajay, >> >> In short story: No, there is no easy way to do that. But if you'd like to >> play around this topic a good starting point would be this blog post from >> sequenceIQ: blog >> <http://blog.sequenceiq.com/blog/201

Re: JavaRDD using Reflection

2015-09-14 Thread Ajay Singal
n driver. Hope this helps! Ajay On Mon, Sep 14, 2015 at 1:21 PM, Ankur Srivastava < ankur.srivast...@gmail.com> wrote: > Hi Rachana > > I didn't get you r question fully but as the error says you can not > perform a rdd transformation or action inside another transformati

Spark_JDBC_Partitions

2016-09-10 Thread Ajay Chander
ER) AS MAX_MOD_VAL FROM DUAL"' ? Any pointers are appreciated. Thanks for your time. ~ Ajay

Re: Spark_JDBC_Partitions

2016-09-19 Thread Ajay Chander
t;> // maropu >>>>>> >>>>>> >>>>>> On Sun, Sep 11, 2016 at 12:37 AM, Mich Talebzadeh < >>>>>> mich.talebza...@gmail.com> wrote: >>>>>> >>>>>>> Strange that Oracle table of

Spark_Jdbc_Hive

2016-10-03 Thread Ajay Chander
using IntelliJ IDE, Maven as build tool and Java . Things that I have got working, - Since the cluster is secured using Kerberos, I had to use a keytab file to authenticate like below, System.setProperty("java.security.krb5.conf", "C:\\Users\\Ajay\\Documents\\Kerb

UseCase_Design_Help

2016-10-04 Thread Ajay Chander
distinct rows/Animal types DF2 has some million rows Whats the best way to achieve this efficiently using parallelism ? Any inputs are helpful. Thank you. Regards, Ajay

Re: UseCase_Design_Help

2016-10-04 Thread Ajay Chander
Right now, I am doing it like below, import scala.io.Source val animalsFile = "/home/ajay/dataset/animal_types.txt" val animalTypes = Source.fromFile(animalsFile).getLines.toArray for ( anmtyp <- animalTypes ) { val distinctAnmTypCount = sqlContext.sql("select count

Re: UseCase_Design_Help

2016-10-05 Thread Ajay Chander
Hi Ayan, My Schema for DF2 is fixed but it has around 420 columns (70 Animal type columns and 350 other columns). Thanks, Ajay On Wed, Oct 5, 2016 at 10:37 AM, ayan guha wrote: > Is your schema for df2 is fixed? ie do you have 70 category columns? > > On Thu, Oct 6, 2016 at 12:50 A

Re: UseCase_Design_Help

2016-10-05 Thread Ajay Chander
d api, so it will be read sequentially. >> >> Furthermore you are going to need create a schema if you want to use >> dataframes. >> >> El 5/10/2016 1:53, "Ajay Chander" escribió: >> >>> Right now, I am doing it like below, >>> &g

UseCase_Design_Help

2016-10-05 Thread Ajay Chander
=10 and count(distinct(element)) > 10 respectively. Thanks, Ajay On Wed, Oct 5, 2016 at 11:12 AM, ayan guha wrote: > Hi > > You can "generate" a sql through program. Python Example: > > >>> schema > ['id', 'Mammals', 'Birds&#

Code review / sqlContext Scope

2016-10-08 Thread Ajay Chander
your inputs on this. $ cat /home/ajay/flds.txt PHARMY_NPI_ID ALT_SUPPLIER_STORE_NBR MAIL_SERV_NBR spark-shell --name hivePersistTest --master yarn --deploy-mode client val dataElementsFile = "/home/ajay/flds.txt" val dataElements = Source.fromFile(dataElementsFile).getLines.to

Re: Code review / sqlContext Scope

2016-10-19 Thread Ajay Chander
t.sql("set hive.exec.dynamic.partition.mode=nonstrict") val dataElementsFile = "hdfs://nameservice/user/ajay/spark/flds.txt" //deDF has only 61 rows val deDF = sqlContext.read.text(dataElementsFile).toDF("DataElement").coalesce(1).distinct().cache() deDF.wi

HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
apache.spark.sql.hive.HiveContext and I see it is extending SqlContext which extends Logging with Serializable. Can anyone tell me if this is the right way to use it ? Thanks for your time. Regards, Ajay

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
(AsynchronousListenerBus.scala:64) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils. scala:1181) at org.apache.spark.util.AsynchronousListenerBus$$anon$ 1.run(AsynchronousListenerBus.scalnerBus.scala:63) Thanks, Ajay On Tue, Oct 25, 2016 at 11:45 PM, Jeff Zhang wrote: > > In your sample cod

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
Sunita, Thanks for your time. In my scenario, based on each attribute from deDF(1 column with just 66 rows), I have to query a Hive table and insert into another table. Thanks, Ajay On Wed, Oct 26, 2016 at 12:21 AM, Sunita Arvind wrote: > Ajay, > > Afaik Generally these contexts

Re: HiveContext is Serialized?

2016-10-25 Thread Ajay Chander
Sean, thank you for making it clear. It was helpful. Regards, Ajay On Wednesday, October 26, 2016, Sean Owen wrote: > This usage is fine, because you are only using the HiveContext locally on > the driver. It's applied in a function that's used on a Scala collection. >

Access_Remote_Kerberized_Cluster_Through_Spark

2016-11-07 Thread Ajay Chander
from quite a while ago. Please let me know if you need more info. Thanks Regards, Ajay

Re: Access_Remote_Kerberized_Cluster_Through_Spark

2016-11-07 Thread Ajay Chander
Did anyone use https://www.codatlas.com/github.com/apache/spark/HEAD/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala to interact with secured Hadoop from Spark ? Thanks, Ajay On Mon, Nov 7, 2016 at 4:37 PM, Ajay Chander wrote: > > Hi Everyone, > > I am trying

Re: Access_Remote_Kerberized_Cluster_Through_Spark

2016-11-09 Thread Ajay Chander
eption: Can't get Master Kerberos principal for use as renewer sc.textFile("hdfs://vm1.comp.com:8020/user/myusr/temp/file1").collect().foreach(println) //Getting this error: java.io.IOException: Can't get Master Kerberos principal for use as renewer } } On Mon,

Re: Spark read csv option - capture exception in a column in permissive mode

2019-06-16 Thread Ajay Thompson
to add the column in the schema > that you are using to read. > > Regards, > Gourav > > On Sun, Jun 16, 2019 at 2:48 PM wrote: > >> Hi Team, >> >> >> >> Can we have another column which gives the corrupted record reason in >> permissive mode while reading csv. >> >> >> >> Thanks, >> >> Ajay >> >

Re: Data locality across jobs

2015-04-03 Thread Ajay Srivastava
ether. Regards,Ajay On Friday, April 3, 2015 2:01 AM, Sandy Ryza wrote: This isn't currently a capability that Spark has, though it has definitely been discussed: https://issues.apache.org/jira/browse/SPARK-1061.  The primary obstacle at this point is that Hadoop's

Instantiating/starting Spark jobs programmatically

2015-04-20 Thread Ajay Singal
nformation/tips/best-practices in this regard? Cheers! Ajay -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Instantiating-starting-Spark-jobs-programmatically-tp22577.html Sent from the Apache Spark User List mailing list archive at

Join : Giving incorrect result

2014-06-04 Thread Ajay Srivastava
and looks correct. But when single worker is used with two or more than two cores, the result seems to be random. Every time, count of joined record is different. Does this sound like a defect or I need to take care of something while using join ? I am using spark-0.9.1. Regards Ajay

Re: Join : Giving incorrect result

2014-06-05 Thread Ajay Srivastava
enough. Thanks Chen for your observation. I get this problem on single worker so there will not be any mismatch of jars. On two workers, since executor memory gets doubled the code works fine. Regards, Ajay On Thursday, June 5, 2014 1:35 AM, Matei Zaharia wrote: If this isn’t the probl

Re: Join : Giving incorrect result

2014-06-06 Thread Ajay Srivastava
Thanks Matei. We have tested the fix and it's working perfectly. Andrew, we set spark.shuffle.spill=false but the application goes out of memory. I think that is expected. Regards,Ajay On Friday, June 6, 2014 3:49 AM, Andrew Ash wrote: Hi Ajay, Can you please try running the

Map with filter on JavaRdd

2014-06-26 Thread ajay garg
Hi All, Is it possible to map and filter a javardd in a single operation? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Map-with-filter-on-JavaRdd-tp8401.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Map with filter on JavaRdd

2014-06-27 Thread ajay garg
Thanks Mayur for clarification.. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Map-with-filter-on-JavaRdd-tp8401p8410.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

OFF_HEAP storage level

2014-07-03 Thread Ajay Srivastava
also explain the behavior of storage level - NONE ? Regards, Ajay

Re: OFF_HEAP storage level

2014-07-04 Thread Ajay Srivastava
Thanks Jerry. It looks like a good option, will try it. Regards, Ajay On Friday, July 4, 2014 2:18 PM, "Shao, Saisai" wrote: Hi Ajay,   StorageLevel OFF_HEAP means for can cache your RDD into Tachyon, the prerequisite is that you should deploy Tachyon among Spark.   Yes, it can

Spark summit 2014 videos ?

2014-07-10 Thread Ajay Srivastava
Hi, I did not find any videos on apache spark channel in youtube yet. Any idea when these will be made available ? Regards, Ajay

Joined RDD

2014-11-12 Thread ajay garg
. Since no data is cached in spark how is action on C is served without reading data from disk. Thanks --Ajay -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Joined-RDD-tp18820.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Joined RDD

2014-11-13 Thread ajay garg
Yes that is my understanding of how it should work. But in my case when I call collect first time, it reads the data from files on the disk. Subsequent collect queries are not reading data files ( Verified from the logs.) On spark ui I see only shuffle read and no shuffle write. -- View this mes

Creating RDD from only few columns of a Parquet file

2015-01-12 Thread Ajay Srivastava
think that spark is reading all the columns from disk in case of table1 when it needs only 3 columns. How should I make sure that it reads only 3 of 10 columns from disk ? Regards, Ajay

Re: Creating RDD from only few columns of a Parquet file

2015-01-13 Thread Ajay Srivastava
Setting spark.sql.hive.convertMetastoreParquet to true has fixed this. Regards,Ajay On Tuesday, January 13, 2015 11:50 AM, Ajay Srivastava wrote: Hi,I am trying to read a parquet file using -val parquetFile = sqlContext.parquetFile("people.parquet") There is no way

Some tasks are taking long time

2015-01-15 Thread Ajay Srivastava
cala:262)     at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)     at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)     Any inputs/suggestions to improve job time will be appreciated. Regards,Ajay

Re: Some tasks are taking long time

2015-01-15 Thread Ajay Srivastava
Thanks RK. I can turn on speculative execution but I am trying to find out actual reason for delay as it happens on any node. Any idea about the stack trace in my previous mail. Regards,Ajay On Thursday, January 15, 2015 8:02 PM, RK wrote: If you don't want a few slow tas

Re: Some tasks are taking long time

2015-01-15 Thread Ajay Srivastava
Thanks Nicos.GC does not contribute much to the execution time of the task. I will debug it further today. Regards,Ajay On Thursday, January 15, 2015 11:55 PM, Nicos wrote: Ajay, Unless we are dealing with some synchronization/conditional variable bug in Spark, try this per tuning