Hi there!
It seems like you have Read/Execute access permission (and no
update/insert/delete access). What operation are you performing?
Ajay
> On Jun 17, 2015, at 5:24 PM, nitinkak001 wrote:
>
> I am trying to run a hive query from Spark code using HiveContext object. It
> was
Hi Walrus,
Try caching the results just before calling the rdd.count.
Regards,
Ajay
> On Nov 13, 2015, at 7:56 PM, Walrus theCat wrote:
>
> Hi,
>
> I have an RDD which crashes the driver when being collected. I want to send
> the data on its partitions out to S3 without b
cheduler.scala:696)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
What I am doing wrong. Please guide.
*Ajay Dubey*
---
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
--
Thanks,
Ajay
You can look at the spark master UI at port 4040. It should tell you all
the currently running stages as well as past/future stages.
On Sun, May 20, 2018, 12:22 AM giri ar wrote:
> Hi,
>
>
> Good Day.
>
> Could you please let me know whether we can see spark logical or physical
> plan while runn
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> -----
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
--
Thanks,
Ajay
ut I believe this only pertains to
> standalone mode and we are using the mesos deployment mode. So I don't
> think this flag actually does anything.
>
>
> Thanks,
> Jeff
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
--
Thanks,
Ajay
Hadoop, HBase?. We may use
Cassandra/MongoDb/CouchBase as well.
4) Is Spark supports RDBMS too?. We can have a single interface to pull out
data from multiple data sources?
5) Any recommendations(not limited to usage of Spark) for our specific
requirement described above.
Thanks
Ajay
Note : I have
a')
2) It takes around .6 second using Spark (either SELECT * FROM users WHERE
name='Anna' or javaFunctions(sc).cassandraTable("test", "people",
mapRowTo(Person.class)).where("name=?", "Anna");
Please let me know if I am missing something in
Hi,
Can we use Storm Streaming as RDD in Spark? Or any way to get Spark work
with Storm?
Thanks
Ajay
Hi,
The question is to do streaming in Spark with Storm (not using Spark
Streaming).
The idea is to use Spark as a in-memory computation engine and static data
coming from Cassandra/Hbase and streaming data from Storm.
Thanks
Ajay
On Tue, Dec 23, 2014 at 2:03 PM, Gerard Maas wrote:
>
Right. I contacted the SummingBird users as well. It doesn't support Spark
streaming currently.
We are heading towards Storm as it is mostly widely used. Is Spark
streaming production ready?
Thanks
Ajay
On Tue, Dec 23, 2014 at 3:47 PM, Gerard Maas wrote:
> I'm not aware of a p
.
Thanks in advance.
Regards,
Ajay
Unsubscribe
Hi Ashok,
Try using hivecontext instead of sqlcontext. I suspect sqlcontext doesnot
have that functionality. Let me know if it works.
Thanks,
Ajay
On Friday, March 4, 2016, ashokkumar rajendran <
ashokkumar.rajend...@gmail.com> wrote:
> Hi Ayan,
>
> Thanks for the response.
Hi Everyone, a quick question with in this context. What is the underneath
persistent storage that you guys are using? With regards to this
containerized environment? Thanks
On Thursday, March 10, 2016, yanlin wang wrote:
> How you guys make driver docker within container to be reachable from
>
Mich,
Can you try the value for paymentdata to this
format paymentdata='2015-01-01 23:59:59' , to_date(paymentdate) and see if
it helps.
On Thursday, March 24, 2016, Tamas Szuromi
wrote:
> Hi Mich,
>
> Take a look
> https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/sql/functions.ht
Hi Everyone,
we are planning to migrate the data between 2 clusters and I see distcp
doesn't support data compression. Is there any efficient way to compress
the data during the migration ? Can I implement any spark job to do this ?
Thanks.
I will try that out. Thank you!
On Tuesday, May 10, 2016, Deepak Sharma wrote:
> Yes that's what I intended to say.
>
> Thanks
> Deepak
> On 10 May 2016 11:47 pm, "Ajay Chander" > wrote:
>
>> Hi Deepak,
>>Thanks for your response. If I
Hi Deepak,
Thanks for your response. If I am correct, you suggest reading all
of those files into an rdd on the cluster using wholeTextFiles then apply
compression codec on it, save the rdd to another Hadoop cluster?
Thank you,
Ajay
On Tuesday, May 10, 2016, Deepak Sharma wrote:
>
it. Is there any possible/effiencient way to achieve this?
Thanks,
Aj
On Tuesday, May 10, 2016, Ajay Chander wrote:
> I will try that out. Thank you!
>
> On Tuesday, May 10, 2016, Deepak Sharma > wrote:
>
>> Yes that's what I intended to say.
>>
>> Thank
Never mind! I figured it out by saving it as hadoopfile and passing the
codec to it. Thank you!
On Tuesday, May 10, 2016, Ajay Chander wrote:
> Hi, I have a folder temp1 in hdfs which have multiple format files
> test1.txt, test2.avsc (Avro file) in it. Now I want to compress these
Hi Everyone,
I am building a Java Spark application in eclipse IDE. From my application
I want to use hiveContext to read tables from the remote Hive(Hadoop
cluster). On my machine I have exported $HADOOP_CONF_DIR =
{$HOME}/hadoop/conf/. This path has all the remote cluster conf details
like hive-
gards,
Aj
On Monday, May 23, 2016, Ajay Chander wrote:
> Hi Everyone,
>
> I am building a Java Spark application in eclipse IDE. From my application
> I want to use hiveContext to read tables from the remote Hive(Hadoop
> cluster). On my machine I have exported $HADOOP_CONF_DIR =
wn where the issue is ?
>
>
> Sent from my iPhone
>
> On May 23, 2016, at 5:26 PM, Ajay Chander > wrote:
>
> I downloaded the spark 1.5 untilities and exported SPARK_HOME pointing to
> it. I copied all the cluster configuration files(hive-site.xml,
> hdfs-site.xml etc
Hi Everyone,
I have some data located on the EdgeNode. Right
now, the process I follow to copy the data from Edgenode to HDFS is through
a shellscript which resides on Edgenode. In Oozie I am using a SSH action
to execute the shell script on Edgenode which copies the dat
Hi Everyone, Any insights on this thread? Thank you.
On Friday, May 27, 2016, Ajay Chander wrote:
> Hi Everyone,
>
>I have some data located on the EdgeNode. Right
> now, the process I follow to copy the data from Edgenode to HDFS is through
> a sh
Hi Vikash,
These are my thoughts, read the input directory using wholeTextFiles()
which would give a paired RDD with key as file name and value as file
content. Then you can apply a map function to read each line and append key
to the content.
Thank you,
Aj
On Tuesday, May 31, 2016, Vikash Kumar
Hi Spark users,
Right now we are using spark for everything(loading the data from
sqlserver, apply transformations, save it as permanent tables in hive) in
our environment. Everything is being done in one spark application.
The only thing we do before we launch our spark application through
oozie
ur hd
>> 2. use spark-streanming to read data from that directory and store it
>> into hdfs
>>
>> perhaps there is some sort of spark 'connectors' that allows you to read
>> data from a db directly so you dont need to go via spk streaming?
>>
>>
y on your hd
>>> 2. use spark-streanming to read data from that directory and store it
>>> into hdfs
>>>
>>> perhaps there is some sort of spark 'connectors' that allows you to read
>>> data from a db directly so you dont need to go vi
; But you can maintain a file e.g. extractRange.conf in hdfs , to read from
> it the end range and update it with new end range from spark job before it
> finishes with the new relevant ranges to be used next time.
>
> On Tue, Jun 7, 2016 at 8:49 PM, Ajay Chander > wrote:
>
>>
_dt|
|SR_NO|start_dt|end_dt|
|SR_NO|start_dt|end_dt|
|SR_NO|start_dt|end_dt|
+-++--+
Since both programs are using the same driver com.sas.rio.MVADriver .
Expected output should be same as my pure java programs output. But
something else is happening behind the scenes.
A
Hi again, anyone in this group tried to access SAS dataset through Spark
SQL ? Thank you
Regards,
Ajay
On Friday, June 10, 2016, Ajay Chander wrote:
> Hi Spark Users,
>
> I hope everyone here are doing great.
>
> I am trying to read data from SAS through Spark SQL and
9:05,935] INFO ps(2.1)#executeQuery SELECT
"SR_NO","start_dt","end_dt" FROM sasLib.run_control ; created result set
2.1.1; time= 0.102 secs (com.sas.rio.MVAStatement:590)
Please find complete program and full logs attached in the below thread.
Thank you.
Regards,
Ajay
On Fr
ID
> , CLUSTERED
> , SCATTERED
> , RANDOMISED
> , RANDOM_STRING
> , SMALL_VC
> , PADDING
> FROM tmp
> """
>HiveContext.sql(sqltext)
> println ("\nFinished at"); sqlContext.sql("SELE
I tried implementing the same functionality through Scala as well. But no
luck so far. Just wondering if anyone here tried using Spark SQL to read
SAS dataset? Thank you
Regards,
Ajay
On Friday, June 10, 2016, Ajay Chander wrote:
> Mich, I completely agree with you. I built another Spark
providing the table name?
Yes I did that too. It did not made any difference.
Thank you,
Ajay
On Sunday, June 12, 2016, Mohit Jaggi wrote:
> Looks like a bug in the code generating the SQL query…why would it be
> specific to SAS, I can’t guess. Did you try the same with another database?
.
Regards,
Ajay
On Thursday, June 2, 2016, Mich Talebzadeh
wrote:
> thanks for that.
>
> I will have a look
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <http
Thanks for the confirmation Mich!
On Wednesday, June 22, 2016, Mich Talebzadeh
wrote:
> Hi Ajay,
>
> I am afraid for now transaction heart beat do not work through Spark, so I
> have no other solution.
>
> This is interesting point as with Hive running on Spark engine there
in spark 2.0 ? I did search commits done in 2.0 branch and
looks like I need to use spark.sql.files.openCostInBytes but I am not sure.
Regards,Ajay
Hi All,
I am running 3 executors in my spark streaming application with 3
cores per executors. I have written my custom receiver for receiving network
data.
In my current configuration I am launching 3 receivers , one receiver per
executor.
In the run if 2 of my executor dies, I am left
Hi Sparklers,
Can you guys give an elaborate documentation of Spark UI as there are many
fields in it and we do not know much about it.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-documentaton-needed-tp26300.html
Sent from the Apache Spark Use
e the file. Why
don't you see hdfs logs and see what's happening when your application is
talking to namenode? I suspect some networking issue or check if the
datanodes are running fine.
Thank you,
Ajay
On Saturday, October 3, 2015, Jacinto Arias wrote:
> Yes printing the result
Hi Everyone,
Any one has any idea if spark-1.5.1 is available as a service on
HortonWorks ? I have spark-1.3.1 installed on the Cluster and it is a
HortonWorks distribution. Now I want upgrade it to spark-1.5.1. Anyone here
have any idea about it? Thank you in advance.
Regards,
Ajay
that I
can get it upgraded through Ambari UI ? If possible can anyone point me to
a documentation online? Thank you.
Regards,
Ajay
On Wednesday, October 21, 2015, Saisai Shao wrote:
> Hi Frans,
>
> You could download Spark 1.5.1-hadoop 2.6 pre-built tarball and copy into
> HDP 2.
Hi Everyone,
I have a use case where I have to create a DataFrame inside the map()
function. To create a DataFrame it need sqlContext or hiveContext. Now how
do I pass the context to my map function ? And I am doing it in java. I
tried creating a class "TestClass" which implements "Function"
and i
main(HistoryServer.scala:231)
at
org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
I went to the lib folder and noticed that
"spark-assembly-1.5.1-hadoop2.6.0.jar" is missing that class. I was able to
get the spark history server started with 1.3.1 but not 1.5.1. Any inputs
on this?
Really appreciat
rs, e.g., 4040,
4041, 4042...
Thanks,
Ajay
On Fri, Jul 24, 2015 at 6:21 AM, Joji John wrote:
> *HI,*
>
> *I am getting this error for some of spark applications. I have multiple
> spark applications running in parallel. Is there a limit in the number of
> spark applications that I c
this helps.
Ajay
On Thu, Jul 23, 2015 at 6:40 AM, Chintan Bhatt <
chintanbhatt...@charusat.ac.in> wrote:
> Hi.
> I'm facing following error while running .ova file containing Hortonworks
> with Spark in Oracle VM Virtual Box:
>
> Failed to open a session for the vi
Hi Joji,
To my knowledge, Spark does not offer any such function.
I agree, defining a function to find an open (random) port would be a good
option. However, in order to invoke the corresponding SparkUI one needs
to know this port number.
Thanks,
Ajay
On Fri, Jul 24, 2015 at 10:19 AM, Joji
ity response, and if needed, I will open a JIRA item.
I hope it helps.
Regards,
Ajay
On Mon, Aug 3, 2015 at 1:16 PM, Sujit Pal wrote:
> @Silvio: the mapPartitions instantiates a HttpSolrServer, then for each
> query string in the partition, sends the query to Solr using SolrJ, and
> gets
Hi Tim,
An option like spark.mesos.executor.max to cap the number of executors per
node/application would be very useful. However, having an option like
spark.mesos.executor.num
to specify desirable number of executors per node would provide even/much
better control.
Thanks,
Ajay
On Wed, Aug
s can specify desirable number of executors. If not
available, Mesos (in a simple implementation) can provide/offer whatever is
available. In a slightly complex implementation, we can build a simple
protocol to negotiate.
Regards,
Ajay
On Wed, Aug 12, 2015 at 5:51 PM, Tim Chen wrote:
> You're
e. But somehow it's
not happening. Please tell me if my assumption is wrong or if I am missing
anything here.
I have attached the word count program that I was using. Any help is highly
appreciated.
Thank you,
Ajay
submit_spark_job
Description: Binary data
-
Hi David,
Thanks for responding! My main intention was to submit spark Job/jar to
yarn cluster from my eclipse with in the code. Is there any way that I
could pass my yarn configuration somewhere in the code to submit the jar to
the cluster?
Thank you,
Ajay
On Sunday, August 30, 2015, David
l.com
> > wrote:
>
>> Hi Ajay,
>>
>> In short story: No, there is no easy way to do that. But if you'd like to
>> play around this topic a good starting point would be this blog post from
>> sequenceIQ: blog
>> <http://blog.sequenceiq.com/blog/201
n driver.
Hope this helps!
Ajay
On Mon, Sep 14, 2015 at 1:21 PM, Ankur Srivastava <
ankur.srivast...@gmail.com> wrote:
> Hi Rachana
>
> I didn't get you r question fully but as the error says you can not
> perform a rdd transformation or action inside another transformati
ER) AS MAX_MOD_VAL FROM DUAL"' ?
Any pointers are appreciated.
Thanks for your time.
~ Ajay
t;> // maropu
>>>>>>
>>>>>>
>>>>>> On Sun, Sep 11, 2016 at 12:37 AM, Mich Talebzadeh <
>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>
>>>>>>> Strange that Oracle table of
using IntelliJ IDE, Maven as build tool and Java .
Things that I have got working,
- Since the cluster is secured using Kerberos, I had to use a keytab
file to authenticate like below,
System.setProperty("java.security.krb5.conf",
"C:\\Users\\Ajay\\Documents\\Kerb
distinct rows/Animal types
DF2 has some million rows
Whats the best way to achieve this efficiently using parallelism ?
Any inputs are helpful. Thank you.
Regards,
Ajay
Right now, I am doing it like below,
import scala.io.Source
val animalsFile = "/home/ajay/dataset/animal_types.txt"
val animalTypes = Source.fromFile(animalsFile).getLines.toArray
for ( anmtyp <- animalTypes ) {
val distinctAnmTypCount = sqlContext.sql("select
count
Hi Ayan,
My Schema for DF2 is fixed but it has around 420 columns (70 Animal type
columns and 350 other columns).
Thanks,
Ajay
On Wed, Oct 5, 2016 at 10:37 AM, ayan guha wrote:
> Is your schema for df2 is fixed? ie do you have 70 category columns?
>
> On Thu, Oct 6, 2016 at 12:50 A
d api, so it will be read sequentially.
>>
>> Furthermore you are going to need create a schema if you want to use
>> dataframes.
>>
>> El 5/10/2016 1:53, "Ajay Chander" escribió:
>>
>>> Right now, I am doing it like below,
>>>
&g
=10
and count(distinct(element)) > 10 respectively.
Thanks,
Ajay
On Wed, Oct 5, 2016 at 11:12 AM, ayan guha wrote:
> Hi
>
> You can "generate" a sql through program. Python Example:
>
> >>> schema
> ['id', 'Mammals', 'Birds
your
inputs on this.
$ cat /home/ajay/flds.txt
PHARMY_NPI_ID
ALT_SUPPLIER_STORE_NBR
MAIL_SERV_NBR
spark-shell --name hivePersistTest --master yarn --deploy-mode client
val dataElementsFile = "/home/ajay/flds.txt"
val dataElements = Source.fromFile(dataElementsFile).getLines.to
t.sql("set hive.exec.dynamic.partition.mode=nonstrict")
val dataElementsFile = "hdfs://nameservice/user/ajay/spark/flds.txt"
//deDF has only 61 rows
val deDF =
sqlContext.read.text(dataElementsFile).toDF("DataElement").coalesce(1).distinct().cache()
deDF.wi
apache.spark.sql.hive.HiveContext
and I see it is extending SqlContext which extends Logging with
Serializable.
Can anyone tell me if this is the right way to use it ? Thanks for your time.
Regards,
Ajay
(AsynchronousListenerBus.scala:64)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.
scala:1181)
at org.apache.spark.util.AsynchronousListenerBus$$anon$
1.run(AsynchronousListenerBus.scalnerBus.scala:63)
Thanks,
Ajay
On Tue, Oct 25, 2016 at 11:45 PM, Jeff Zhang wrote:
>
> In your sample cod
Sunita, Thanks for your time. In my scenario, based on each attribute from
deDF(1 column with just 66 rows), I have to query a Hive table and insert
into another table.
Thanks,
Ajay
On Wed, Oct 26, 2016 at 12:21 AM, Sunita Arvind
wrote:
> Ajay,
>
> Afaik Generally these contexts
Sean, thank you for making it clear. It was helpful.
Regards,
Ajay
On Wednesday, October 26, 2016, Sean Owen wrote:
> This usage is fine, because you are only using the HiveContext locally on
> the driver. It's applied in a function that's used on a Scala collection.
>
from quite a while ago. Please let me know if
you need more info. Thanks
Regards,
Ajay
Did anyone use
https://www.codatlas.com/github.com/apache/spark/HEAD/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
to interact with secured Hadoop from Spark ?
Thanks,
Ajay
On Mon, Nov 7, 2016 at 4:37 PM, Ajay Chander wrote:
>
> Hi Everyone,
>
> I am trying
eption: Can't get Master
Kerberos principal for use as renewer
sc.textFile("hdfs://vm1.comp.com:8020/user/myusr/temp/file1").collect().foreach(println)
//Getting this error: java.io.IOException: Can't get Master
Kerberos principal for use as renewer
}
}
On Mon,
to add the column in the schema
> that you are using to read.
>
> Regards,
> Gourav
>
> On Sun, Jun 16, 2019 at 2:48 PM wrote:
>
>> Hi Team,
>>
>>
>>
>> Can we have another column which gives the corrupted record reason in
>> permissive mode while reading csv.
>>
>>
>>
>> Thanks,
>>
>> Ajay
>>
>
ether.
Regards,Ajay
On Friday, April 3, 2015 2:01 AM, Sandy Ryza
wrote:
This isn't currently a capability that Spark has, though it has definitely
been discussed: https://issues.apache.org/jira/browse/SPARK-1061. The primary
obstacle at this point is that Hadoop's
nformation/tips/best-practices in
this regard?
Cheers!
Ajay
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Instantiating-starting-Spark-jobs-programmatically-tp22577.html
Sent from the Apache Spark User List mailing list archive at
and looks correct. But when single worker is used with two or more than two
cores, the result seems to be random. Every time, count of joined record is
different.
Does this sound like a defect or I need to take care of something while using
join ? I am using spark-0.9.1.
Regards
Ajay
enough.
Thanks Chen for your observation. I get this problem on single worker so there
will not be any mismatch of jars. On two workers, since executor memory gets
doubled the code works fine.
Regards,
Ajay
On Thursday, June 5, 2014 1:35 AM, Matei Zaharia
wrote:
If this isn’t the probl
Thanks Matei. We have tested the fix and it's working perfectly.
Andrew, we set spark.shuffle.spill=false but the application goes out of
memory. I think that is expected.
Regards,Ajay
On Friday, June 6, 2014 3:49 AM, Andrew Ash wrote:
Hi Ajay,
Can you please try running the
Hi All,
Is it possible to map and filter a javardd in a single operation?
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Map-with-filter-on-JavaRdd-tp8401.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Thanks Mayur for clarification..
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Map-with-filter-on-JavaRdd-tp8401p8410.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
also explain the behavior of storage level - NONE ?
Regards,
Ajay
Thanks Jerry.
It looks like a good option, will try it.
Regards,
Ajay
On Friday, July 4, 2014 2:18 PM, "Shao, Saisai" wrote:
Hi Ajay,
StorageLevel OFF_HEAP means for can cache your RDD into Tachyon, the
prerequisite is that you should deploy Tachyon among Spark.
Yes, it can
Hi,
I did not find any videos on apache spark channel in youtube yet.
Any idea when these will be made available ?
Regards,
Ajay
.
Since no data is cached in spark how is action on C is served without
reading data from disk.
Thanks
--Ajay
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Joined-RDD-tp18820.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Yes that is my understanding of how it should work.
But in my case when I call collect first time, it reads the data from files
on the disk.
Subsequent collect queries are not reading data files ( Verified from the
logs.)
On spark ui I see only shuffle read and no shuffle write.
--
View this mes
think that
spark is reading all the columns from disk in case of table1 when it needs only
3 columns.
How should I make sure that it reads only 3 of 10 columns from disk ?
Regards,
Ajay
Setting spark.sql.hive.convertMetastoreParquet to true has fixed this.
Regards,Ajay
On Tuesday, January 13, 2015 11:50 AM, Ajay Srivastava
wrote:
Hi,I am trying to read a parquet file using -val parquetFile =
sqlContext.parquetFile("people.parquet")
There is no way
cala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
Any inputs/suggestions to improve job time will be appreciated.
Regards,Ajay
Thanks RK. I can turn on speculative execution but I am trying to find out
actual reason for delay as it happens on any node. Any idea about the stack
trace in my previous mail.
Regards,Ajay
On Thursday, January 15, 2015 8:02 PM, RK wrote:
If you don't want a few slow tas
Thanks Nicos.GC does not contribute much to the execution time of the task. I
will debug it further today.
Regards,Ajay
On Thursday, January 15, 2015 11:55 PM, Nicos wrote:
Ajay, Unless we are dealing with some synchronization/conditional variable bug
in Spark, try this per tuning
93 matches
Mail list logo