date:20200619

Kafka Zeppelin integration

2020-06-19 Thread silavala

hi here is my question. Spark code run on zeppelin is unable to find kafka source even though a dependency is specified. I ask is there any way to fix this. Zeppelin version is 0.9.0, Spark version is 2.4.6, and kafka version is 2.4.1. I have specified the dependency in the packages and add a j

Re: [pyspark 2.3+] read/write huge data with smaller block size (128MB per block)

2020-06-19 Thread Rishi Shah

Thanks Sean! To combat the skew I do have another column I partitionby and that has worked well (like below). However in the image I attached in my original email - it looks like 2 tasks processed nothing, may I reading SPARKUI task table right? All 4 dates have date - 2 dates have ~200MB & other 2

Re: [pyspark 2.3+] read/write huge data with smaller block size (128MB per block)

2020-06-19 Thread Sean Owen

Yes you'll generally get 1 partition per block, and 1 task per partition. The amount of RAM isn't directly relevant; it's not loaded into memory. But you may nevertheless get some improvement with larger partitions / tasks, though typically only if your tasks are very small and very fast right now

Re: Reading TB of JSON file

2020-06-19 Thread Chetan Khatri

Thanks, you meant in a for loop. could you please put pseudocode in spark On Fri, Jun 19, 2020 at 8:39 AM Jörn Franke wrote: > Make every json object a line and then read t as jsonline not as multiline > > Am 19.06.2020 um 14:37 schrieb Chetan Khatri >: > > > All transactions in JSON, It is n

Re: Reading TB of JSON file

2020-06-19 Thread Jörn Franke

Make every json object a line and then read t as jsonline not as multiline > Am 19.06.2020 um 14:37 schrieb Chetan Khatri : > > > All transactions in JSON, It is not a single array. > >> On Thu, Jun 18, 2020 at 12:55 PM Stephan Wehner >> wrote: >> It's an interesting problem. What is the s

Re: Reading TB of JSON file

2020-06-19 Thread Chetan Khatri

All transactions in JSON, It is not a single array. On Thu, Jun 18, 2020 at 12:55 PM Stephan Wehner wrote: > It's an interesting problem. What is the structure of the file? One big > array? On hash with many key-value pairs? > > Stephan > > On Thu, Jun 18, 2020 at 6:12 AM Chetan Khatri > wrote:

Re: Reading TB of JSON file

2020-06-19 Thread Chetan Khatri

Yes On Thu, Jun 18, 2020 at 12:34 PM Gourav Sengupta wrote: > Hi, > So you have a single JSON record in multiple lines? > And all the 50 GB is in one file? > > Regards, > Gourav > > On Thu, 18 Jun 2020, 14:34 Chetan Khatri, > wrote: > >> It is dynamically generated and written at s3 bucket not

Re: Hey good looking toPandas ()

2020-06-19 Thread Anwar AliKhan

I got an illegal argument error with 2.4.6. I then pointed my Jupiter notebook to 3.0 version and it worked as expected. Using same .ipnyb file. I was following this machine learning example. “Your First Apache Spark ML Model” by Favio Vázquez https://towardsdatascience.com/your-first-apache-spa

Re: Hey good looking toPandas ()

2020-06-19 Thread Stephen Boesch

afaik It has been there since Spark 2.0 in 2015. Not certain about Spark 1.5/1.6 On Thu, 18 Jun 2020 at 23:56, Anwar AliKhan wrote: > I first ran the command > df.show() > > For sanity check of my dataFrame. > > I wasn't impressed with the display. > > I then ran > df.toPandas() in Jupiter N

Kafka Zeppelin integration

Re: [pyspark 2.3+] read/write huge data with smaller block size (128MB per block)

Re: [pyspark 2.3+] read/write huge data with smaller block size (128MB per block)

Re: Reading TB of JSON file

Re: Reading TB of JSON file

Re: Reading TB of JSON file

Re: Reading TB of JSON file

Re: Hey good looking toPandas ()

Re: Hey good looking toPandas ()

9 matches

Site Navigation

Mail list logo

Footer information