Stream is corrupted in ShuffleBlockFetcherIterator

2019-08-15 Thread Mikhail Pryakhin
Hello, Spark community! I've been struggling with my job which constantly fails due to inability to uncompress some previously compressed blocks while shuffling data. I use spark 2.2.0 with all the configuration settings left by default (no specific compression codec is specified). I've

Call Oracle Sequence using Spark

2019-08-15 Thread rajat kumar
Hi All, I have to call Oracle sequence using spark. Can you pls tell what is the way to do that? Thanks Rajat

Memory Limits error

2019-08-15 Thread Dennis Suhari
Hi community, I am using Spark on Yarn. When submiting a job after a long time I get an error mesage and retry. It happens when I want to store the dataframe to a table. spark_df.write.option("path", "/nlb_datalake/golden_zone/webhose/sentiment").saveAsTable("news_summary_test",

Spark streaming kafka source delay occasionally

2019-08-15 Thread ans
using kafka consumer, 2 mins batch, tasks process take 2 ~ 5 seconds in general, but a part of tasks take more than 40 seconds. I guess *CachedKafkaConsumer#poll* could be problem. private def poll(timeout: Long): Unit = { val p = consumer.poll(timeout) val r = p.records(topicPartition)

Re: Spark Streaming concurrent calls

2019-08-15 Thread Tianlang
Hi Whether kafka topic's partition number can help ?! 在 2019/8/13 下午10:53, Amit Sharma 写道: I am using kafka spark streming. My UI application send request to streaming through kafka. Problem is streaming handles one request at a time so if multiple users send request at the same time they

Re: help understanding physical plan

2019-08-15 Thread Tianlang
Hi, Maybe you can look at the spark ui. The physical plan has no time consuming information. 在 2019/8/13 下午10:45, Marcelo Valle 写道: Hi, I have a job running on AWS EMR. It's basically a join between 2 tables (parquet files on s3), one somehow large (around 50 gb) and other small (less