Re: Log file location in Spark on K8s

2023-10-09 Thread Prashant Sharma
Hi Sanket, Driver and executor logs are written to stdout by default, it can be configured using SPARK_HOME/conf/log4j.properties file. The file including the entire SPARK_HOME/conf is auto propogateded to all driver and executor container and mounted as volume. Thanks On Mon, 9 Oct, 2023, 5:37

Re: Connection pool shut down in Spark Iceberg Streaming Connector

2023-10-05 Thread Prashant Sharma
Hi Sanket, more details might help here. How does your spark configuration look like? What exactly was done when this happened? On Thu, 5 Oct, 2023, 2:29 pm Agrawal, Sanket, wrote: > Hello Everyone, > > > > We are trying to stream the changes in our Iceberg tables stored in AWS > S3. We are ac

Re: EOF Exception Spark Structured Streams - Kubernetes

2021-02-01 Thread Prashant Sharma
Hi Sachit, The fix verison on that JIRA says 3.0.2, so this fix is not yet released. Soon, there will be a 3.1.1 release, in the meantime you can try out the 3.1.1-rc which also has the fix and let us know your findings. Thanks, On Mon, Feb 1, 2021 at 10:24 AM Sachit Murarka wrote: > Followin

Re: Suggestion on Spark 2.4.7 vs Spark 3 for Kubernetes

2021-01-05 Thread Prashant Sharma
A lot of developers may have already moved to 3.0.x, FYI 3.1.0 is just around the corner hopefully(in a few days) and has a lot of improvements to spark on K8s, including it will be transitioning from experimental to GA in this release. See: https://issues.apache.org/jira/browse/SPARK-33005 Than

Re: Error while running Spark on K8s

2021-01-04 Thread Prashant Sharma
ate.driver.serviceAccountName=spark-sa --conf > spark.kubernetes.container.image=sparkpy local:///opt/spark/da/main.py > > Kind Regards, > Sachit Murarka > > > On Mon, Jan 4, 2021 at 5:46 PM Prashant Sharma > wrote: > >> Hi Sachit, >> >> Can you give more details on how did you

Re: Error while running Spark on K8s

2021-01-04 Thread Prashant Sharma
Hi Sachit, Can you give more details on how did you run? i.e. spark submit command. My guess is, a service account with sufficient privilege is not provided. Please see: http://spark.apache.org/docs/latest/running-on-kubernetes.html#rbac Thanks, On Mon, Jan 4, 2021 at 5:27 PM Sachit Murarka wro

Re: Spark3 on k8S reading encrypted data from HDFS with KMS in HA

2020-08-19 Thread Prashant Sharma
-dev Hi, I have used Spark with HDFS encrypted with Hadoop KMS, and it worked well. Somehow, I could not recall, if I had the kubernetes in the mix. Somehow, seeing the error, it is not clear what caused the failure. Can I reproduce this somehow? Thanks, On Sat, Aug 15, 2020 at 7:18 PM Michel Su

Re: Spark 3.0 with Hadoop 2.6 HDFS/Hive

2020-07-19 Thread Prashant Sharma
Hi Ashika, Hadoop 2.6 is now no longer supported, and since it has not been maintained in the last 2 years, it means it may have some security issues unpatched. Spark 3.0 onwards, we no longer support it, in other words, we have modified our codebase in a way that Hadoop 2.6 won't work. However, i

Re: Spark Compatibility with Java 11

2020-07-14 Thread Prashant Sharma
Hi Ankur, Java 11 support was added in Spark 3.0. https://issues.apache.org/jira/browse/SPARK-24417 Thanks, On Tue, Jul 14, 2020 at 6:12 PM Ankur Mittal wrote: > Hi, > > I am using Spark 2.X and need to execute Java 11 .Its not able to execute > Java 11 using Spark 2.X. > > Is there any way w

Re: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

2020-07-12 Thread Prashant Sharma
> scalable and dynamic-allocation-enabled for deploying Spark on K8s? Any > suggested github repo or link? > > > > Thanks, > > Vaibhav V > > > > > > *From:* Prashant Sharma > *Sent:* Friday, July 10, 2020 12:57 AM > *To:* user@spark.apache.org

Re: [Spark 3.0 Kubernetes] Does Spark 3.0 support production deployment

2020-07-09 Thread Prashant Sharma
Hi, Whether it is a blocker or not, is upto you to decide. But, spark k8s cluster supports dynamic allocation, through a different mechanism, that is, without using an external shuffle service. https://issues.apache.org/jira/browse/SPARK-27963. There are pros and cons of both approaches. The only

Employment opportunities.

2019-06-12 Thread Prashant Sharma
Hi, My employer(IBM) is interested in hiring people in hyderabad if they are committers in any of the Apache Projects and are interested Spark and ecosystem. Thanks, Prashant.

Spark Streaming RDD Cleanup too slow

2018-09-05 Thread Prashant Sharma
I have a Spark Streaming job which takes too long to delete temp RDD's. I collect about 4MM telemetry metrics per minute and do minor aggregations in the Streaming Job. I am using Amazon R4 instances. The Driver RPC call although Async,i believe, is slow getting the handle for future object at "

Re: Spark Structured Streaming not connecting to Kafka using kerberos

2017-10-26 Thread Prashant Sharma
Hi Darshan, Did you try passing the config directly as an option, like this: .option("kafka.sasl.jaas.config", saslConfig) Where saslConfig can look like: com.sun.security.auth.module.Krb5LoginModule required \ useKeyTab=true \ storeKey=true \ keyTab="/etc/security/key

Kafka Spark structured streaming latency benchmark.

2016-12-17 Thread Prashant Sharma
Hi, Goal of my benchmark is to arrive at end to end latency lower than 100ms and sustain them over time, by consuming from a kafka topic and writing back to another kafka topic using Spark. Since the job does not do aggregation and does a constant time processing on each message, it appeared to me

Re: If we run sc.textfile(path,xxx) many times, will the elements be the same in each partition

2016-11-10 Thread Prashant Sharma
+user -dev Since the same hash based partitioner is in action by default. In my understanding every time same partitioning will happen. Thanks, On Nov 10, 2016 7:13 PM, "WangJianfei" wrote: > Hi Devs: > If i run sc.textFile(path,xxx) many times, will the elements be the > same(same elemen

Re: Large files with wholetextfile()

2016-07-12 Thread Prashant Sharma
Hi Baahu, That should not be a problem, given you allocate sufficient buffer for reading. I was just working on implementing a patch[1] to support the feature for reading wholetextfiles in SQL. This can actually be slightly better approach, because here we read to offheap memory for holding data(

Re: Streaming K-means not printing predictions

2016-04-26 Thread Prashant Sharma
Since you are reading from file stream, I would suggest instead of printing try to save it on a file. There may be output the first time and then no data in subsequent iterations. Prashant Sharma On Tue, Apr 26, 2016 at 7:40 PM, Ashutosh Kumar wrote: > I created a Streaming k means based

Re: Save RDD to HDFS using Spark Python API

2016-04-26 Thread Prashant Sharma
ormat.html is one such formatter class. thanks, Prashant Sharma On Wed, Apr 27, 2016 at 5:22 AM, Davies Liu wrote: > hdfs://192.168.10.130:9000/dev/output/test already exists, so you need > to remove it first. > > On Tue, Apr 26, 2016 at 5:28 AM, Luke Adolph wrote: > > H

Re: Choosing an Algorithm in Spark MLib

2016-04-20 Thread Prashant Sharma
As far as I can understand, your requirements are pretty straight forward and doable with just simple SQL queries. Take a look at Spark SQL on spark documentation. Prashant Sharma On Tue, Apr 12, 2016 at 8:13 PM, Joe San wrote: > up vote > down votefavorite &

Re: Spark streaming batch time displayed is not current system time but it is processing current messages

2016-04-19 Thread Prashant Sharma
This can happen if system time is not in sync. By default, streaming uses SystemClock(it also supports ManualClock) and that relies on System.currentTimeMillis() for determining start time. Prashant Sharma On Sat, Apr 16, 2016 at 10:09 PM, Hemalatha A < hemalatha.amru...@googlemail.com>

Re: [Spark 1.5.2] Log4j Configuration for executors

2016-04-18 Thread Prashant Sharma
May be you can try creating it before running the App.

Re: Processing millions of messages in milliseconds -- Architecture guide required

2016-04-18 Thread Prashant Sharma
xml[1] messages. Thanks, Prashant Sharma 1. https://github.com/databricks/spark-xml On Tue, Apr 19, 2016 at 10:31 AM, Deepak Sharma wrote: > Hi all, > I am looking for an architecture to ingest 10 mils of messages in the > micro batches of seconds. > If anyone has worked on sim

Re: Renaming sc variable in sparkcontext throws task not serializable

2016-03-02 Thread Prashant Sharma
*This is a known issue. * https://issues.apache.org/jira/browse/SPARK-3200 Prashant Sharma On Thu, Mar 3, 2016 at 9:01 AM, Rahul Palamuttam wrote: > Thank you Jeff. > > I have filed a JIRA under the following link : > > https://issues.apache.org/jira/browse/SPARK-13634 >

Re: External JARs not loading Spark Shell Scala 2.11

2015-04-09 Thread Prashant Sharma
This is the jira I referred to https://issues.apache.org/jira/browse/SPARK-3256. Another reason for not working on it is evaluating priority between upgrading to scala 2.11.5(it is non trivial I suppose because repl has changed a bit) or migrating that patch is much simpler. Prashant Sharma On

Re: External JARs not loading Spark Shell Scala 2.11

2015-04-09 Thread Prashant Sharma
planning to work, I can help you ? Prashant Sharma On Thu, Apr 9, 2015 at 3:08 PM, anakos wrote: > Hi- > > I am having difficulty getting the 1.3.0 Spark shell to find an external > jar. I have build Spark locally for Scala 2.11 and I am starting the REPL > as follows: >

UnsatisfiedLinkError related to libgfortran when running MLLIB code on RHEL 5.8

2015-03-03 Thread Prashant Sharma
Hi Folks, We are trying to run the following code from the spark shell in a CDH 5.3 cluster running on RHEL 5.8. *spark-shell --master yarn --deploy-mode client --num-executors 15 --executor-cores 6 --executor-memory 12G * *import org.apache.spark.mllib.recommendation.ALS * *import org.apache.spa

Re: Bind Exception

2015-01-19 Thread Prashant Sharma
is just a warning. FYI spark ignores BindException and probes for next available port and continues. So you application is fine if that particular error comes up. Prashant Sharma On Tue, Jan 20, 2015 at 10:30 AM, Deep Pradhan wrote: > Yes, I have increased the driver memory in sp

Re: Is it safe to use Scala 2.11 for Spark build?

2014-11-17 Thread Prashant Sharma
Looks like sbt/sbt -Pscala-2.11 is broken by a recent patch for improving maven build. Prashant Sharma On Tue, Nov 18, 2014 at 12:57 PM, Prashant Sharma wrote: > It is safe in the sense we would help you with the fix if you run into > issues. I have used it, but since I worked on the

Re: Is it safe to use Scala 2.11 for Spark build?

2014-11-17 Thread Prashant Sharma
/patch-3/docs/building-spark.md Prashant Sharma On Tue, Nov 18, 2014 at 12:19 PM, Jianshi Huang wrote: > Any notable issues for using Scala 2.11? Is it stable now? > > Or can I use Scala 2.11 in my spark application and use Spark dist build > with 2.10 ? > > I'm lookin

Re: Spray client reports Exception: akka.actor.ActorSystem.dispatcher()Lscala/concurrent/ExecutionContext

2014-10-28 Thread Prashant Sharma
spray depends on and use the akka spark depends on. Prashant Sharma On Wed, Oct 29, 2014 at 9:27 AM, Jianshi Huang wrote: > I'm using Spark built from HEAD, I think it uses modified Akka 2.3.4, > right? > > Jianshi > > On Wed, Oct 29, 2014 at 5:53 AM, Mohammed Gulle

Re: Spark SQL reduce number of java threads

2014-10-28 Thread Prashant Sharma
What is the motivation behind this ? You can start with master as local[NO_OF_THREADS]. Reducing the threads at all other places can have unexpected results. Take a look at this. http://spark.apache.org/docs/latest/configuration.html. Prashant Sharma On Tue, Oct 28, 2014 at 2:08 PM, Wanda

Re: unable to make a custom class as a key in a pairrdd

2014-10-23 Thread Prashant Sharma
Are you doing this in REPL ? Then there is a bug filed for this, I just can't recall the bug ID at the moment. Prashant Sharma On Fri, Oct 24, 2014 at 4:07 AM, Niklas Wilcke < 1wil...@informatik.uni-hamburg.de> wrote: > Hi Jao, > > I don't really know why this do

Re: Default spark.deploy.recoveryMode

2014-10-15 Thread Prashant Sharma
So if you need those features you can go ahead and setup one of Filesystem or zookeeper options. Please take a look at: http://spark.apache.org/docs/latest/spark-standalone.html. Prashant Sharma On Wed, Oct 15, 2014 at 3:25 PM, Chitturi Padma < learnings.chitt...@gmail.com> wrote: &

Re: Default spark.deploy.recoveryMode

2014-10-14 Thread Prashant Sharma
[Removing dev lists] You are absolutely correct about that. Prashant Sharma On Tue, Oct 14, 2014 at 5:03 PM, Priya Ch wrote: > Hi Spark users/experts, > > In Spark source code (Master.scala & Worker.scala), when registering the > worker with master, I see the usage of *p

Re: Nested Case Classes (Found and Required Same)

2014-09-12 Thread Prashant Sharma
What is your spark version ? This was fixed I suppose. Can you try it with latest release ? Prashant Sharma On Fri, Sep 12, 2014 at 9:47 PM, Ramaraju Indukuri wrote: > This is only a problem in shell, but works fine in batch mode though. I am > also interested in how others are solvi

Re: .sparkrc for Spark shell?

2014-09-03 Thread Prashant Sharma
Hey, You can use spark-shell -i sparkrc, to do this. Prashant Sharma On Wed, Sep 3, 2014 at 2:17 PM, Jianshi Huang wrote: > To make my shell experience merrier, I need to import several packages, > and define implicit sparkContext and sqlContext. > > Is there a start

Re: spark streaming actor receiver doesn't play well with kryoserializer

2014-07-30 Thread Prashant Sharma
-framework/chill-akka) might help. I am not well aware about how kryo works internally, may be someone else can throw some light on this. Prashant Sharma On Sat, Jul 26, 2014 at 6:26 AM, Alan Ngai wrote: > The stack trace was from running the Actor count sample directly, without > a

Re: Emacs Setup Anyone?

2014-07-26 Thread Prashant Sharma
s setup it is kinda fast to do either tag prediction at point which is not accurate etc.. but its useful. Incase you are working on building this(inferior mode for spark repl) for us, I can come up with a wishlist. Prashant Sharma On Sat, Jul 26, 2014 at 3:07 AM, Andrei wrote: > I have neve

Re: ZeroMQ Stream -> stack guard problem and no data

2014-06-03 Thread Prashant Sharma
Hi, What is your Zeromq version ? It is known to work well with 2.2 an output of `sudo ldconfig -v | grep zmq` would helpful in this regard. Thanks Prashant Sharma On Wed, Jun 4, 2014 at 11:40 AM, Tobias Pfeiffer wrote: > Hi, > > I am trying to use Spark Streaming (1.0.0) with Ze

Re: when to use broadcast variables

2014-05-02 Thread Prashant Sharma
I had like to be corrected on this but I am just trying to say small enough of the order of few 100 MBs. Imagine the size gets shipped to all nodes, it can be a GB but not GBs and then depends on the network too. Prashant Sharma On Fri, May 2, 2014 at 6:42 PM, Diana Carroll wrote: > Any

Re: Apache Spark is not building in Mac/Java 8

2014-05-02 Thread Prashant Sharma
I have pasted the link in my previous post. Prashant Sharma On Fri, May 2, 2014 at 4:15 PM, N.Venkata Naga Ravi wrote: > Thanks for your quick replay. > > I tried with fresh installation, it downloads sbt 0.12.4 only (please > check below logs). So it is not working. Can you tel

Re: Apache Spark is not building in Mac/Java 8

2014-05-02 Thread Prashant Sharma
%3DGJh1g2zxOJd02Wt7L06mCLjo-vwwG9Q%40mail.gmail.com%3E Prashant Sharma On Fri, May 2, 2014 at 3:56 PM, N.Venkata Naga Ravi wrote: > > Hi, > > > I am tyring to build Apache Spark with Java 8 in my Mac system ( OS X > 10.8.5) , but getting following exception. > Please help on resolving

Re: 答复: Issue during Spark streaming with ZeroMQ source

2014-04-29 Thread Prashant Sharma
Well that is not going to be easy, simply because we depend on akka-zeromq for zeromq support. And since akka does not support the latest zeromq library yet, I doubt if there is something simple that can be done to support it. Prashant Sharma On Tue, Apr 29, 2014 at 2:44 PM, Francis.Hu wrote

Re: Issue during Spark streaming with ZeroMQ source

2014-04-29 Thread Prashant Sharma
zeromq 2.2.0 and if you have jzmq libraries installed performance is much better. Prashant Sharma On Tue, Apr 29, 2014 at 12:29 PM, Francis.Hu wrote: > Hi, all > > > > I installed spark-0.9.1 and zeromq 4.0.1 , and then run below example: > >

Re: Need help about how hadoop works.

2014-04-24 Thread Prashant Sharma
It is the same file and hadoop library that we use for splitting takes care of assigning the right split to each node. Prashant Sharma On Thu, Apr 24, 2014 at 1:36 PM, Carter wrote: > Thank you very much for your help Prashant. > > Sorry I still have another question about yo

Re: Need help about how hadoop works.

2014-04-23 Thread Prashant Sharma
Prashant Sharma On Thu, Apr 24, 2014 at 12:15 PM, Carter wrote: > Thanks Mayur. > > So without Hadoop and any other distributed file systems, by running: > val doc = sc.textFile("/home/scalatest.txt",5) > doc.count > we can only get parallelization within

Re: standalone vs YARN

2014-04-15 Thread Prashant Sharma
Hi Ishaaq, answers inline from what I know, I had like to be corrected though. On Tue, Apr 15, 2014 at 5:58 PM, ishaaq wrote: > Hi all, > I am evaluating Spark to use here at my work. > > We have an existing Hadoop 1.x install which I planning to upgrade to > Hadoop > 2.3. > > This is not reall

Re: Guidelines for Spark Cluster Sizing

2014-04-03 Thread Prashant Sharma
for a memory friendly workload. I think it would be good to post experiences and then that can eventually become some sort of guidelines. Prashant Sharma On Thu, Apr 3, 2014 at 1:36 PM, Sonal Goyal wrote: > Hi, > > My earlier email did not get any response, I am looking for some &g

Re: K-means faster on Mahout then on Spark

2014-03-25 Thread Prashant Sharma
I think Mahout uses FuzzyKmeans, which is different algorithm and it is not iterative. Prashant Sharma On Tue, Mar 25, 2014 at 6:50 PM, Egor Pahomov wrote: > Hi, I'm running benchmark, which compares Mahout and SparkML. For now I > have next results for k-means: > Number of

Re: Custom RDD

2014-03-10 Thread Prashant Sharma
Hi David, There are many implementations of RDD available in org.apache.spark. All you have to do is implement RDD class. Ofcourse this is not possible from java AFAIK. Prashant Sharma On Tue, Mar 11, 2014 at 1:00 AM, David Thomas wrote: > Is there any guide available on creating a cus

Re: Implementing a custom Spark shell

2014-02-28 Thread Prashant Sharma
You can enable debug logging for repl, thankfully it uses sparks logging framework. Trouble must be with wrappers. Prashant Sharma On Fri, Feb 28, 2014 at 12:29 PM, Sampo Niskanen wrote: > Hi, > > Thanks for the pointers. I did get my code working within the normal > spark-she