Re: Spark SQL reads all leaf directories on a partitioned Hive table

2019-08-14 Thread Hao Ren
Thank you, Subash. It works! On Tue, Aug 13, 2019 at 5:58 AM Subash Prabakar wrote: > I had the similar issue reading the external parquet table . In my case I > had permission issue in one partition so I added filter to exclude that > partition but still the spark didn’t prune it. Then I read t

Re: Release Apache Spark 2.4.4

2019-08-14 Thread Holden Karau
That looks like more of a feature than a bug fix unless I’m missing something? On Tue, Aug 13, 2019 at 11:58 PM Hyukjin Kwon wrote: > Adding Shixiong > > WDYT? > > 2019년 8월 14일 (수) 오후 2:30, Terry Kim 님이 작성: > >> Can the following be included? >> >> [SPARK-27234][SS][PYTHON] Use InheritableThread

Spark Structured Streaming XML content

2019-08-14 Thread Nick Dawes
I'm trying to analyze data using Kinesis source in PySpark Structured Streaming on Databricks. Ceeated a Dataframe as shown below. kinDF = spark.readStream.format("kinesis").("streamName", "test-stream-1").load() Converted the data from base64 encoding as below. df = kinDF.withColumn("xml_data

Re: Release Apache Spark 2.4.4

2019-08-14 Thread Dongjoon Hyun
Thank you, DB, Takeshi, Hyukjin, Sean, Kazuaki, Holden, Wenchen! I'll create tag for 2.4.4-rc1 next Monday. For SPARK-27234, it looks like that to me, too. Thanks, Dongjoon. On Wed, Aug 14, 2019 at 9:13 AM Holden Karau wrote: > That looks like more of a feature than a bug fix unless I’m missi

Re: help understanding physical plan

2019-08-14 Thread Tianlang
Hi, Maybe you can look at the spark ui. The physical plan has no time consuming information. 在 2019/8/13 下午10:45, Marcelo Valle 写道: Hi, I have a job running on AWS EMR. It's basically a join between 2 tables (parquet files on s3), one somehow large (around 50 gb) and other small (less than

Re: Spark Streaming concurrent calls

2019-08-14 Thread Tianlang
Hi Whether kafka topic's partition number can help ?! 在 2019/8/13 下午10:53, Amit Sharma 写道: I am using kafka spark streming. My UI application send request to streaming through kafka. Problem is streaming handles one request at a time so if multiple users send request at the same time they have