Following link you will get all required details
https://aws.amazon.com/blogs/containers/best-practices-for-running-spark-on-amazon-eks/
Let me know if you required further informations.
Regards,
Vaquar khan
On Mon, May 15, 2023, 10:14 PM Mich Talebzadeh
wrote:
> Couple of points
>
I saw you are looking holden video .please find following link.
https://www.oreilly.com/library/view/debugging-apache-spark/9781492039174/
Regards,
Vaquar khan
On Sun, Mar 12, 2023, 6:56 PM Mich Talebzadeh
wrote:
> Hi Denny,
>
> Thanks for the offer. How do you envisage that struct
@ Gourav Sengupta why you are sending unnecessary emails ,if you think
snowflake good plz use it ,here question was different and you are talking
totally different topic.
Plz respects group guidelines
Regards,
Vaquar khan
On Wed, Dec 28, 2022, 10:29 AM vaquar khan wrote:
> Here you can f
Here you can find all details , you just need to pass spark dataframe and
deequ also generate recommendations for rules and you can also write custom
complex rules.
https://aws.amazon.com/blogs/big-data/test-data-quality-at-scale-with-deequ/
Regards,
Vaquar khan
On Wed, Dec 28, 2022, 9:40 AM
I would suggest Deequ , I have implemented many time easy and effective.
Regards,
Vaquar khan
On Tue, Dec 27, 2022, 10:30 PM ayan guha wrote:
> The way I would approach is to evaluate GE, Deequ (there is a python
> binding called pydeequ) and others like Delta Live tables with expect
e. Merci beaucoup mes amis :)
>
> [1] https://stackoverflow.com/q/66933229/1305344
>
> Pozdrawiam,
> Jacek Laskowski
>
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books <https://books.japila.pl/>
> Follow me on https://twitter.com/jaceklasko
HI Pedro,
What is your usecase ,why you used coqlesce ,coalesce() is very expensive
operations as they shuffle the data across many partitions hence try to
minimize repartition as much as possible.
Regards,
Vaquar khan
On Thu, Mar 18, 2021, 5:47 PM Pedro Tuero wrote:
> I was reviewin
Hi Yang,
Please find following link
https://stackoverflow.com/questions/63677736/spark-application-as-a-rest-service/63678337#63678337
Regards,
Vaquar khan
On Wed, Nov 25, 2020 at 12:40 AM Sonal Goyal wrote:
> You should be able to supply the --conf and its values as part of appA
Hi Swetha,
It would be great if you ask same question in stackoverflow , we have very
active community and monitor stack for each spark questions.
If you ask same question via stack other ppl also get benefits for similar
problems.
Regards,
Vaquar khan
On Sun, Sep 29, 2019, 10:26 PM swetha
Hi Deepak,
You can use textFileStream.
https://spark.apache.org/docs/2.2.0/streaming-programming-guide.html
Plz start using stackoverflow to ask question to other ppl so get benefits
of answer
Regards,
Vaquar khan
On Sun, Jun 9, 2019, 8:08 AM Deepak Sharma wrote:
> I am using sp
Sure let me check Jira
Regards,
Vaquar khan
On Thu, Jun 21, 2018, 4:42 PM Takeshi Yamamuro
wrote:
> In this ticket SPARK-24201, the ambiguous statement in the doc had been
> pointed out.
> can you make pr for that?
>
> On Fri, Jun 22, 2018 at 6:17 AM, vaquar khan
>
sion (2.11.x).
Regards,
Vaquar khan
On Thu, Jun 21, 2018 at 11:56 AM, chriswakare <
chris.newski...@intellibridge.co> wrote:
> Hi Rahul,
> This will work only in Java 8.
> Installation does not work with both version 9 and 10
>
> Thanks,
> Christopher
>
>
>
>
https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html
Regards,
Vaquar khan
On Wed, Jun 20, 2018, 1:18 AM Aakash Basu
wrote:
> Hi guys,
>
> I just wanted to know, why my ParallelGC (*--conf
> "spark.executor.extraJavaOptions=-XX:+UseP
Why you need tool,you can directly connect Hbase using spark.
Regards,
Vaquar khan
On Jun 18, 2018 4:37 PM, "Lian Jiang" wrote:
Hi,
I am considering tools to load hbase data using spark. One choice is
https://github.com/Huawei-Spark/Spark-SQL-on-HBase. However, this seems to
be o
persist or any other logical separation in pipeline.
Regards,
Vaquar khan
On Sun, Jun 17, 2018 at 5:25 AM, Eyal Zituny wrote:
> Hi Akash,
> such errors might appear in large spark pipelines, the root cause is a
> 64kb jvm limitation.
> the reason that your job isn't failing at
Hi Akash,
Please check stackoverflow.
https://stackoverflow.com/questions/41098953/codegen-grows-beyond-64-kb-error-when-normalizing-large-pyspark-dataframe
Regards,
Vaquar khan
On Sat, Jun 16, 2018 at 3:27 PM, Aakash Basu
wrote:
> Hi guys,
>
> I'm getting an error wh
Plz check ur Java Home path .
May be spacial char or space on ur path.
Regards,
Vaquar khan
On Sat, Jun 16, 2018, 1:36 PM Raymond Xie wrote:
> I am trying to run spark-shell in Windows but receive error of:
>
> \Java\jre1.8.0_151\bin\java was unexpected at this time.
>
&
lions of records will be big delay in
response.
Regards,
Vaquar khan
On Mon, Jun 11, 2018, 2:59 AM Teemu Heikkilä wrote:
> So you are now providing the data on-demand through spark?
>
> I suggest you change your API to query from cassandra and store the
> results from Spark back th
https://stackoverflow.com/questions/26562033/how-to-set-apache-spark-executor-memory
Regards,
Vaquar khan
On Mon, Nov 13, 2017 at 6:22 PM, Alec Swan wrote:
> Hello,
>
> I am using the Spark library to convert JSON/Snappy files to ORC/ZLIB
> format. Effectively, my Java service
Confirmed ,you can use Accumulators :)
Regards,
Vaquar khan
On Mon, Nov 13, 2017 at 10:58 AM, Kedarnath Dixit <
kedarnath_di...@persistent.com> wrote:
> Hi,
>
>
> We need some way to toggle the flag of a variable in transformation.
>
>
> We are thinking to make use
as an argument of textFile the path
of the file in the worker filesystem.
Regards,
Vaquar khan
On Fri, Sep 29, 2017 at 2:00 PM, JG Perrin wrote:
> On a test system, you can also use something like
> Owncloud/Nextcloud/Dropbox to insure that the files are synchronized. Would
> not do it
http://spark.apache.org/docs/latest/sql-programming-guide.html#migration-guide
Regards,
Vaquar khan
On Fri, Sep 22, 2017 at 4:41 PM, Gokula Krishnan D
wrote:
> Thanks for the reply. Forgot to mention that, our Batch ETL Jobs are in
> Core-Spark.
>
>
> On Sep 22, 2017, at 3:13 PM
entered into maintenance mode.
Regards,
Vaquar khan
On Sat, Sep 23, 2017 at 4:04 PM, Koert Kuipers wrote:
> our main challenge has been the lack of support for missing values
> generally
>
> On Sat, Sep 23, 2017 at 3:41 AM, Irfan Kabli
> wrote:
>
>> Dear All,
>>
>&
Launcher.DRIVER_MEMORY, "2g")
.launch();
spark.waitFor();
}
}
*Note :*
a user application is launched using the bin/spark-submit script. This
script takes care of setting up the classpath with Spark and its
dependencies, and can support different cluster managers
http://ampcamp.berkeley.edu/6/exercises/time-series-tutorial-taxis.html
Regards,
Vaquar khan
On Wed, Aug 30, 2017 at 1:21 PM, Irving Duran
wrote:
> I think it will work. Might want to explore spark streams.
>
>
> Thank You,
>
> Irving Duran
>
> On Wed, Aug 30, 2017 at 10:50 AM,
Following error we are getting because of dependency mismatch.
Regards,
vaquar khan
On Jul 17, 2017 3:50 AM, "zzcclp" <441586...@qq.com> wrote:
Hi guys:
I am using spark 2.1.1 to test on CDH 5.7.1, when i run on yarn with
following command, error
dashboards. In fact, you can apply Spark’s machine learning
<https://spark.apache.org/docs/latest/ml-guide.html> and graph processing
<https://spark.apache.org/docs/latest/graphx-programming-guide.html> algorithms
on data streams.
Regards,
Vaquar khan
On Sun, Jun 11, 2017 at 3:12 AM,
g for memory growth). A simple check
that the file can be read would be:
sc.textFile(file, numPartitions).count()
You can get good explanation here :
https://stackoverflow.com/questions/29011574/how-does-
partitioning-work-for-data-from-files-on-hdfs
Regards,
Vaquar khan
On Jun 11, 2017 5:
Avoid groupby and use reducebykey.
Regards,
Vaquar khan
On Jun 4, 2017 8:32 AM, "Guy Cohen" wrote:
> Try this one:
>
> df.groupBy(
> when(expr("field1='foo'"),"field1").when(expr("field2='bar'"),"field2"))
>
://spark.apache.org/docs/1.1.0/submitting-applications.html
Also try to avoid function need memory like collect etc.
Regards,
Vaquar khan
On Jun 4, 2017 5:46 AM, "Abdulfattah Safa" wrote:
I'm working on Spark with Standalone Cluster mode. I need to increase the
Driver Memory as I
Hi ,
Pleaae check your firewall security setting sharing link one good link.
http://belablotski.blogspot.in/2016/01/access-hive-tables-from-spark-using.html?m=1
Regards,
Vaquar khan
On Jun 8, 2017 1:53 AM, "Patrik Medvedev" wrote:
> Hello guys,
>
> Can somebody help
It's depends on programming style ,I would like to say setup few rules to
avoid complex code in scala , if needed ask programmer to add proper
comments.
Regards,
Vaquar khan
On Jun 8, 2017 4:17 AM, "JB Data" wrote:
> Java is Object langage borned to Data, Python is Data
Hi Ayan,
If you have multiple files (example 12 files )and you are using following
code then you will get 12 partition.
r = sc.textFile("file://my/file/*")
Not sure what you want to know about file system ,please check API doc.
Regards,
Vaquar khan
On Jun 8, 2017 10:44 AM, "ay
You can add filter or replace null with value like 0 or string.
df.na.fill(0, Seq("y"))
Regards,
Vaquar khan
On Jun 2, 2017 11:25 AM, "Alonso Isidoro Roman" wrote:
not sure if this can help you, but you can infer programmatically the
schema providing a json schema file,
HI ,
I found following two links are helpful sharing with you .
http://stackoverflow.com/questions/38353524/how-to-ensure-partitioning-induced-by-spark-dataframe-join
http://spark.apache.org/docs/latest/configuration.html
Regards,
Vaquar khan
On Wed, Mar 29, 2017 at 2:45 PM, Vidya Sujeet
Please read Spark documents at least once before asking question.
http://spark.apache.org/docs/latest/streaming-programming-guide.html
http://2s7gjr373w3x22jf92z99mgm5w-wpengine.netdna-ssl.com/wp-content/uploads/2015/11/spark-streaming-datanami.png
Regards,
Vaquar khan
On Fri, Mar 10, 2017
/content/troubleshooting/javaionotserializableexception.html
Regards,
Vaquar khan
On Fri, Feb 17, 2017 at 9:36 PM, Darshan Pandya
wrote:
> Hello,
>
> I am getting the famous serialization exception on running some code as
> below,
>
> val correctColNameUDF = udf(getNewColumnNam
Did you try MSCK REPAIR TABLE ?
Regards,
Vaquar Khan
On Feb 6, 2017 11:21 AM, "KhajaAsmath Mohammed"
wrote:
> I dont think so, i was able to insert overwrite other created tables in
> hive using spark sql. The only problem I am facing is, spark is not able
> to recog
Hi Ashmath,
Try refresh table
// spark is an existing SparkSession
spark.catalog.refreshTable("my_table")
http://spark.apache.org/docs/latest/sql-programming-guide.html#metadata-refreshing
Regards,
Vaquar khan
On Sun, Feb 5, 2017 at 7:19 PM, KhajaAsmath Mohammed &l
https://databricks.gitbooks.io/databricks-spark-reference-applications/content/timeseries/index.html
Regards,
Vaquar khan
On Wed, Jan 11, 2017 at 10:07 AM, Dirceu Semighini Filho <
dirceu.semigh...@gmail.com> wrote:
> Hello Rishabh,
> We have done some forecasting, for time-series,
Hi Deepak,
Could you share Index information in your database.
select * from indexInfo;
Regards,
Vaquar khan
On Sat, Dec 17, 2016 at 2:45 PM, Holden Karau wrote:
> How many workers are in the cluster?
>
> On Sat, Dec 17, 2016 at 12:23 PM Deepak Sharma
> wrote:
>
>
Hi Kant,
Hope following information will help .
1)Cluster
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-standalone.html
http://spark.apache.org/docs/latest/hardware-provisioning.html
2) Yarn vs Mesos
https://www.linkedin.com/pulse/mesos-compare-yarn-vaquar-
khan
That kind of issue SparkUI and DAG visualization always helpful.
https://databricks.com/blog/2015/06/22/understanding-your-spark-application-through-visualization.html
Regards,
Vaquar khan
On Fri, Dec 16, 2016 at 11:10 AM, Vikas K. wrote:
> Unsubscribe.
>
> On Fri, Dec 16, 2016 a
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-rdd-partitions.html
Regards,
vaquar khan
On Wed, Dec 14, 2016 at 12:15 PM, Vaibhav Sinha
wrote:
> Hi,
> I see a similar behaviour in an exactly similar scenario at my deployment
> as well. I am using scal
Not sure about your logic 0 and 1 but you can use orderBy the data
according to time and get the first value.
Regards,
Vaquar khan
On Wed, Dec 14, 2016 at 10:49 PM, Milin korath
wrote:
> Hi
>
> I have a spark data frame with following structure
>
> id flag price date
>
Hi Neeraj,
As per my understanding Spark SQL doesn't support Update statements .
Why you need update command in Spark SQL, You can run command in Hive .
Regards,
Vaquar khan
On Mon, Dec 12, 2016 at 10:21 PM, Niraj Kumar wrote:
> Hi
>
>
>
> I am working on SpqrkSQL using
I found following links are good as I am using same.
http://spark.apache.org/docs/latest/tuning.html
https://spark-summit.org/2014/testing-spark-best-practices/
Regards,
Vaquar khan
On 8 Aug 2016 10:11, "Deepak Sharma" wrote:
> Hi All,
> Can anyone please give any documents t
Hi Asfanyar,
*NoSuchMethodError *in Java means you compiled against one version of code
, and executed against a different version.
Please make sure your java version and adding dependency version is working
on same java version.
regards,
vaquar khan
On Fri, Jun 10, 2016 at 4:50 AM, Asfandyar
.0.1, and then I tried “ifconfig l0 down” and the
>> Worker IP address become 127.0.1.1.
>>
>> What should I do to make IP use the IP address of the Ethernet instead of
>> the address of the wireless?
>>
>> Thanks
>>
>> Jay
>>
>>
>>
>> Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for
>> Windows 10
>>
>>
>>
>
>
--
Regards,
Vaquar Khan
+91 830-851-1500
Hi Sharad.
The array size you (or the serializer) tries to allocate is just too big
for the JVM.
You can also split your input further by increasing parallelism.
Following is good explanintion
https://plumbr.eu/outofmemoryerror/requested-array-size-exceeds-vm-limit
regards,
Vaquar khan
On
/client/src/main/resources/spark-action-0.1.xsd
regards,
Vaquar Khan
On Wed, Jun 8, 2016 at 5:26 AM, karthi keyan
wrote:
> Hi ,
>
> Make sure you have oozie 4.2.0 and configured with either yarn / mesos
> mode.
>
> Well, you just parse your scala / Jar file i
Spark Streaming or do an
incremental select to make sure your Spark SQL tables stay up to date with
your production databases
Regards,
Vaquar khan
On 7 Jun 2016 10:29, "Deepak Sharma" wrote:
I am not sure if Spark provides any support for incremental extracts
inherently.
But you can
Hi Chintan,
Please share complete code ,error logs also mention version of spark etc.
If you are new in spark development ,please start with following doc.
http://spark.apache.org/docs/latest/quick-start.html
Regards,
Vaquar khan
On 3 Oct 2015 20:39, "Ted Yu" wrote:
> Please
Hi Abhishek,
Please learn spark ,there are no shortcuts for sucess.
Regards,
Vaquar khan
On 29 Jul 2015 11:32, "Mishra, Abhishek" wrote:
> Hello,
>
> Please help me with links or some document for Apache Spark interview
> questions and answers. Also for the tools re
I would suggest study spark ,flink,strom and based on your understanding
and finding prepare your research paper.
May be you will invented new spark ☺
Regards,
Vaquar khan
On 16 Jul 2015 00:47, "Michael Segel" wrote:
> Silly question…
>
> When thinking about a PhD thesis… do
My choice is java 8
On 15 Jul 2015 18:03, "Alan Burlison" wrote:
> On 15/07/2015 08:31, Ignacio Blasco wrote:
>
> The main advantage of using scala vs java 8 is being able to use a console
>>
>
> https://bugs.openjdk.java.net/browse/JDK-8043364
>
> --
> Alan Burlison
> --
>
> ---
Totally agreed with hafasa, you need to identify your requirements and
needs before choose spark.
If you want to handle data with fast access go to no sql (mongo,aerospike
etc) if you need data analytical then spark is best .
Regards,
Vaquar khan
On 14 Jul 2015 20:39, "Hafsa Asif" wr
I am using SBT
On 26 Jan 2015 15:54, "Luke Wilson-Mawer" wrote:
> I use this: http://scala-ide.org/
>
> I also use Maven with this archetype:
> https://github.com/davidB/scala-archetype-simple. To be frank though, you
> should be fine using SBT.
>
> On Sat, Jan 24, 2015 at 6:33 PM, riginos wrote
:) good one
On 10 Mar 2014 23:21, "arjun biswas" wrote:
> Hello ,
>
> My name is Arjun and i am 30 years old and I was inquiring about the room
> ad that you have put up on craigslist in Aptos. I am very much interested
> in the room and can move in pretty early . My annual income is around 105K
59 matches
Mail list logo