Found this issue reported earlier but was bulk closed:
https://issues.apache.org/jira/browse/SPARK-27030
Regards,
Shrikant
On Fri, 22 Sep 2023 at 12:03 AM, Shrikant Prasad
wrote:
> Hi all,
>
> We have multiple spark jobs running in parallel trying to write into same
> hive table
Hi all,
We have multiple spark jobs running in parallel trying to write into same
hive table but each job writing into different partition. This was working
fine with Spark 2.3 and Hadoop 2.7.
But after upgrading to Spark 3.2 and Hadoop 3.2.2, these parallel jobs are
failing with FileNotFound exc
move the sesion inside main(), not a
> member.
> Or what other explanation do you have? I don't understand.
>
> On Mon, Jan 2, 2023 at 10:10 AM Shrikant Prasad
> wrote:
>
>> If that was the case and deserialized session would not work, the
>> application would not hav
:
> It silently allowed the object to serialize, though the
> serialized/deserialized session would not work. Now it explicitly fails.
>
> On Mon, Jan 2, 2023 at 9:43 AM Shrikant Prasad
> wrote:
>
>> Thats right. But the serialization would be happening in Spark 2.3 also,
ause you are trying to use TestMain methods
> in your program.
> This was never correct, but now it's an explicit error in Spark 3. The
> session should not be a member variable.
>
> On Mon, Jan 2, 2023 at 9:24 AM Shrikant Prasad
> wrote:
>
>> Please see these logs. The
executor; that's not the issue. See your stack
> trace, where it clearly happens in the driver.
>
> On Mon, Jan 2, 2023 at 8:58 AM Shrikant Prasad
> wrote:
>
>> Even if I set the master as yarn, it will not have access to rest of the
>> spark confs. It will need spark.yar
wrote:
> So call .setMaster("yarn"), per the error
>
> On Mon, Jan 2, 2023 at 8:20 AM Shrikant Prasad
> wrote:
>
>> We are running it in cluster deploy mode with yarn.
>>
>> Regards,
>> Shrikant
>>
>> On Mon, 2 Jan 2023 at 6:15 PM, Steli
ding to where you want to run this
>
> On Mon, 2 Jan 2023 at 14:38, Shrikant Prasad
> wrote:
>
>> Hi,
>>
>> I am trying to migrate one spark application from Spark 2.3 to 3.0.1.
>>
>> The issue can be reproduced using below sample code:
>>
>>
at TestMain$.(TestMain.scala)
>From the exception it appears that it tries to create spark session on
executor also in Spark 3 whereas its not created again on executor in Spark
2.3.
Can anyone help in identfying why there is this change in behavior?
Thanks and Regards,
Shrikant
--
Regards,
Shrikant Prasad
I have tried with that also. It gives same exception:
ClassNotFoundException: sequencefile.DefaultSource
Regards,
Shrikant
On Mon, 14 Nov 2022 at 6:35 PM, Jie Han wrote:
> It seems that the name is “sequencefile”.
>
> > 2022年11月14日 20:59,Shrikant Prasad 写道:
> >
> &g
Spark 3.2.
Is there any change in sequence file support in 3.2 or any code change is
required to make it work?
Thanks and regards,
Shrikant
--
Regards,
Shrikant Prasad
dynamic allocation is available, however I am not sure how
> it works. Spark official docs
> <https://spark.apache.org/docs/latest/running-on-kubernetes.html#future-work>
> say that shuffle service is not yet available.
>
> Thanks
>
> Nikhil
>
--
Regards,
Shrikant Prasad
reading from s3 works, I am getting error 403 access denied while
writing to the KMS enabled bucket.
I am wondering if I am missing some dependency jars or client configuration
properties.
I would Appreciate your help if someone can give me a few pointers on this.
Regards,
Prasad Paravatha
using very huge EMR clusters. I am
trying to find out the cpu utilization and memory utilization of the nodes.
This will help me find out if the clusters are under utilized and reduce
the nodes,
Is there a better way to get these stats without changing the code?
Thanks,
Prasad
Hi Bo Yang,
Would it be something along the lines of Apache livy?
Thanks,
Prasad
On Tue, Feb 22, 2022 at 10:22 PM bo yang wrote:
> It is not a standalone spark cluster. In some details, it deploys a Spark
> Operator (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator)
> and
Hi,
It will require code changes and I am looking at some third party code , I
am looking for something which I can just hook to jvm and get the stats..
Thanks,
Prasad
On Thu, Jan 20, 2022 at 11:00 AM Sonal Goyal wrote:
> Hi Prasad,
>
> Have you checked the SparkListener
Hello,
Is there any way we can profile spark applications which will show no. of
invocations of spark api and their execution time etc etc just the way
jprofiler shows all the details?
Thanks,
Prasad
https://www.apache.org/dyn/closer.lua/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.3.tgz
FYI, unable to download from this location.
Also, I don’t see Hadoop 3.3 version in the dist
> On Oct 19, 2021, at 9:39 AM, Bode, Meikel, NMA-CFD
> wrote:
>
>
> Many thanks! 😊
>
> From: Gengliang Wang
do my research on this. But please let me know your opinion on this.
Thanks,
Prasad
On Fri 5 Apr, 2019, 1:09 AM Teemu Heikkilä So basically you could have base dump/snapshot of the full database - or
> all the required data stored into HDFS or similar system as partitioned
> files (ie. orc/p
creating a views will help in this scenario?
Can you please tell if I am thinking in right direction?
I have two challenges
1) First to load 2-4 TB of data in spark very quickly.
2) And then keep this data updated in spark whenever DB updates are done.
Thanks,
Prasad
On Fri, Apr 5, 2019 at 12:35 AM
t
and write to a file? I trying to use kind of data locality in this case.
Whenever a data is updated in oracle tables can I refresh the data in spark
storage? I can get the update feed using messaging technology.
Can some one from community help me with this?
Suggestions are welcome.
Thanks,
P
Hello,
I got a message saying , messages sent to me (my gmail id) from the
mailing list got bounced ?
Wonder why ?
thanks,
Prasad.
On Mon, Apr 16, 2018 at 6:16 PM, wrote:
> Hi! This is the ezmlm program. I'm managing the
> user@spark.apache.org mailing list.
>
>
> Mess
: We run the Hive queries in *sample.hql *and redirect the output in
output file output_partition.txt
*Spark:*
Can anyone tell us how to implement this in *Spark sql* ( ie) Executing
the hive.hql file and redirecting the output in one file.
Regards
Prasad
: We run the Hive queries in *sample.hql *and redirect the output in
output file output_partition.txt
*Spark:*
Can anyone tell us how to implement this in *Spark sql* ( ie) Executing
the hive.hql file and redirecting the output in one file.
--
------
Regards,
Prasad T
Hi,
I tried the below code, as
result.write.csv(home/Prasad/)
It is not working,
It says
Error: value csv is not member of org.apache.spark.sql.DataFrameWriter.
Regards
Prasad
On Thu, Jan 19, 2017 at 4:35 PM, smartzjp wrote:
> Beacause the reduce number will be not one, so it will
he output in the
console.
I need to redirect the output in local file as well as HDFS file.
with the delimiter as "|".
We tried with the below code
result.saveAsTextFile ("home/Prasad/result.txt")
It is not working as expected.
--
------
Prasad. T
Also, check the column names of df1 ( after joining df2 and df3 ).
Prasad.
From: Ted Yu
Date: Monday, April 25, 2016 at 8:35 PM
To: Divya Gehlot
Cc: "user @spark"
Subject: Re: Cant join same dataframe twice ?
Can you show us the structure of df2 and df3 ?
Thanks
On Mon, Apr 25, 20
( around 1TB)
I am using Spark version 1.5.2.
Thanks in advance for any insights.
Regards,
Prasad.
Below is the code.
val userAndFmSegment =
userData.as("userdata").join(fmSegmentData.withColumnRenamed("USER_ID",
"FM_USER_ID").as("fmsegmentdata
I am using Spark 1.5.2.
I am not using Dynamic allocation.
Thanks,
Prasad.
On 1/5/16, 3:24 AM, "Ted Yu" wrote:
>Which version of Spark do you use ?
>
>This might be related:
>https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_S
R_ID")
.withColumnRenamed("USER_CNTRY_ID","USER_DIM_COUNTRY_ID")
.as("userdim")
, userAndRetailDates("USER_ID") <=> $"userdim.USER_DIM_USER_ID"
&& userAndRetailDates("USER_CNTRY_ID") <=> $"us
]
InMemoryColumnarTableScan [List of columns ], true, 1,
StorageLevel(true, true, false, true, 1), (Repartition 1, false), None)
Project [ List of Columns ]
Scan AvroRelation[Avro File][List of Columns]
Code Generation: true
Thanks,
Prasad.
Thanks, Koert.
Regards,
Prasad.
From: Koert Kuipers
Date: Thursday, December 17, 2015 at 1:06 PM
To: Prasad Ravilla
Cc: Anders Arpteg, user
Subject: Re: Large number of conf broadcasts
https://github.com/databricks/spark-avro/pull/95<https://urldefense.proofpoint.com/v2/url?u=ht
Hi Anders,
I am running into the same issue as yours. I am trying to read about 120
thousand avro files into a single data frame.
Is your patch part of a pull request from the master branch in github?
Thanks,
Prasad.
From: Anders Arpteg
Date: Thursday, October 22, 2015 at 10:37 AM
To: Koert
I did tried. Same problem.
as you said earlier.
spark.yarn.keytab
spark.yarn.principal
are required.
On Fri, Dec 4, 2015 at 7:25 PM, Ted Yu wrote:
> Did you try setting "spark.authenticate.secret" ?
>
> Cheers
>
> On Fri, Dec 4, 2015 at 7:07 PM, Prasad Reddy wrote:
I am having problem in accessing spark UI while running in spark-client
mode. It works fine in local mode.
It keeps redirecting back to itself by adding /null at the end and
ultimately run out of size limit for url and returns 500. Look at response
below.
I have a feeling that I might be missing
Hi All,
I am having problem in accessing spark UI while running in spark-client
mode. It works fine in local mode.
It keeps redirecting back to itself by adding /null at the end and
ultimately run out of size limit for url and returns 500. Look at following
below.
I have a feeling that I might b
This happened to me as well, putting hive-site.xml inside conf doesn't seem to
work. Instead I added /etc/hive/conf to SPARK_CLASSPATH and it worked. You can
try this approach.
-Skanda
-Original Message-
From: "guxiaobo1982"
Sent: 25-01-2015 13:50
To: "user@spark.apache.org"
Subjec
Hi ,
Can any one please help me to understand which version of Hive support
Spark and Shark
--
--
Regards,
RAVI PRASAD. T
Hi Wisely,
Could you please post your pom.xml here.
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Error-reading-HDFS-file-using-spark-0-9-0-hadoop-2-2-0-incompatible-protobuf-2-5-and-2-4-1-tp2158p3770.html
Sent from the Apache Spark User List maili
Check this thread out,
http://apache-spark-user-list.1001560.n3.nabble.com/Error-reading-HDFS-file-using-spark-0-9-0-hadoop-2-2-0-incompatible-protobuf-2-5-and-2-4-1-tp2158p2807.html
-- you have to remove conflicting akka and protbuf versions.
Thanks
Prasad.
--
View this message in context
hi,
Yes, i did.
PARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly
Further, when i use the spark-shell, i can read the same file and it works
fine.
Thanks
Prasad.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Error-reading-HDFS-file-using-spark
Can someone please let me know if you faced these issues and how u fixed it.
Thanks
Prasad.
Caused by: java.lang.VerifyError: class
org.apache.hadoop.security.proto.SecurityProtos$GetDelegationTokenRequestProto
overrides final method
getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet
42 matches
Mail list logo