from:"\"Kuchekar\""

Re: AnalysisException - Infer schema for the Parquet path

2020-05-09 Thread Nilesh Kuchekar

Hi Chetan, You can have a static parquet file created, and when you create a data frame you can pass the location of both the files, with option mergeSchema true. This will always fetch you a dataframe even if the original file is not present. Kuchekar, Nilesh On Sat, May 9

Customize Partitioner for Datasets

2017-09-28 Thread Kuchekar

Hi, Is there a way we can customize the partitioner for Dataset to be a Hive Hash Partitioner rather than Murmur3 Partitioner. Regards, Kuchekar, Nilesh

Re: IOT in Spark

2017-05-18 Thread Kuchekar

Hi Gaurav, You might want to look for Lambda Architecture with Spark. https://www.youtube.com/watch?v=xHa7pA94DbA Regards, Kuchekar, Nilesh On Thu, May 18, 2017 at 8:58 PM, Gaurav1809 wrote: > Hello gurus, > > How exactly it works in real world scenarios when i

Spark UI shows Jobs are processing, but the files are already written to S3

2016-11-16 Thread Kuchekar

Hi, I am running a spark job, which saves the computed data (massive data) to S3. On the Spark Ui I see the some jobs are active, but no activity in the logs. Also on S3 all the data has be written (verified each bucket --> it has _SUCCESS file) Am I missing something? Thanks. Kuche

Re: Maintaining order of pair rdd

2016-07-26 Thread Kuchekar

,(y,index)) now reduce by key so and then sort the internal with the index value. Thanks. Kuchekar, Nilesh On Tue, Jul 26, 2016 at 7:35 PM, janardhan shetty wrote: > Let me provide step wise details: > > 1. > I have an RDD = { > (ID2,18159) - *element 1 * > (ID1,18159)

Re: Heavy Stage Concentration - Ends With Failure

2016-07-19 Thread Kuchekar

Stage tab of the Spark UI. Kuchekar, Nilesh On Tue, Jul 19, 2016 at 8:16 PM, Aaron Jackson wrote: > Hi, > > I have a cluster with 15 nodes of which 5 are HDFS nodes. I kick off a > job that creates some 120 stages. Eventually, the active and pending > stages reduce down to a small

Re: Spark execuotr Memory profiling

2016-02-20 Thread Kuchekar

tune spark <http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/>, cheatsheet for tuning spark <http://techsuppdiva.github.io/spark1.6.html>. Hope this helps, keep the community posted what resolved your issue if it does. Thanks. Kuchekar, Nilesh On Sat, Feb

Re: Memory issues on spark

2016-02-17 Thread Kuchekar

are setting. Kuchekar, Nilesh On Wed, Feb 17, 2016 at 8:02 PM, wrote: > Hi All, > > I have been facing memory issues in spark. im using spark-sql on AWS EMR. > i have around 50GB file in AWS S3. I want to read this file in BI tool > connected to spark-sql on thrift-server over O

Re: Spark execuotr Memory profiling

2016-02-10 Thread Kuchekar

ad","4000") conf = conf.set("spark.executor.cores","4").set("spark.executor.memory", "15G").set("spark.executor.instances","6") Is it also possible to use reduceBy in place of groupBy that might help the shuffling too. K

Re: AnalysisException - Infer schema for the Parquet path

Customize Partitioner for Datasets

Re: IOT in Spark

Spark UI shows Jobs are processing, but the files are already written to S3

Re: Maintaining order of pair rdd

Re: Heavy Stage Concentration - Ends With Failure

Re: Spark execuotr Memory profiling

Re: Memory issues on spark

Re: Spark execuotr Memory profiling

9 matches

Site Navigation

Mail list logo

Footer information