hi here is my question. Spark code run on zeppelin is unable to find
kafka source even though a dependency is specified. I ask is there any
way to fix this. Zeppelin version is 0.9.0, Spark version is 2.4.6, and
kafka version is 2.4.1. I have specified the dependency in the packages
and add a j
Thanks Sean! To combat the skew I do have another column I partitionby and
that has worked well (like below). However in the image I attached in my
original email - it looks like 2 tasks processed nothing, may I
reading SPARKUI task table right? All 4 dates have date - 2 dates have
~200MB & other 2
Yes you'll generally get 1 partition per block, and 1 task per partition.
The amount of RAM isn't directly relevant; it's not loaded into memory. But
you may nevertheless get some improvement with larger partitions / tasks,
though typically only if your tasks are very small and very fast right now
Thanks, you meant in a for loop. could you please put pseudocode in spark
On Fri, Jun 19, 2020 at 8:39 AM Jörn Franke wrote:
> Make every json object a line and then read t as jsonline not as multiline
>
> Am 19.06.2020 um 14:37 schrieb Chetan Khatri >:
>
>
> All transactions in JSON, It is n
Make every json object a line and then read t as jsonline not as multiline
> Am 19.06.2020 um 14:37 schrieb Chetan Khatri :
>
>
> All transactions in JSON, It is not a single array.
>
>> On Thu, Jun 18, 2020 at 12:55 PM Stephan Wehner
>> wrote:
>> It's an interesting problem. What is the s
All transactions in JSON, It is not a single array.
On Thu, Jun 18, 2020 at 12:55 PM Stephan Wehner
wrote:
> It's an interesting problem. What is the structure of the file? One big
> array? On hash with many key-value pairs?
>
> Stephan
>
> On Thu, Jun 18, 2020 at 6:12 AM Chetan Khatri
> wrote:
Yes
On Thu, Jun 18, 2020 at 12:34 PM Gourav Sengupta
wrote:
> Hi,
> So you have a single JSON record in multiple lines?
> And all the 50 GB is in one file?
>
> Regards,
> Gourav
>
> On Thu, 18 Jun 2020, 14:34 Chetan Khatri,
> wrote:
>
>> It is dynamically generated and written at s3 bucket not
I got an illegal argument error with 2.4.6.
I then pointed my Jupiter notebook to 3.0 version and it worked as
expected.
Using same .ipnyb file.
I was following this machine learning example.
“Your First Apache Spark ML Model” by Favio Vázquez
https://towardsdatascience.com/your-first-apache-spa
afaik It has been there since Spark 2.0 in 2015. Not certain about Spark
1.5/1.6
On Thu, 18 Jun 2020 at 23:56, Anwar AliKhan
wrote:
> I first ran the command
> df.show()
>
> For sanity check of my dataFrame.
>
> I wasn't impressed with the display.
>
> I then ran
> df.toPandas() in Jupiter N