from:"\"Yash Sharma\""

Re: StackOverflow in Spark

2016-06-01 Thread Yash Sharma

Not sure if its related, But I got a similar stack overflow error some time back while reading files and converting them to parquet. > Stack trace- > 16/06/02 02:23:54 INFO YarnAllocator: Driver requested a total number of > 32769 executor(s). > 16/06/02 02:23:54 INFO ExecutorAllocationManager:

Re: Python to Scala

2016-06-17 Thread Yash Sharma

You could use pyspark to run the python code on spark directly. That will cut the effort of learning scala. https://spark.apache.org/docs/0.9.0/python-programming-guide.html - Thanks, via mobile, excuse brevity. On Jun 18, 2016 2:34 PM, "Aakash Basu" wrote: > Hi all, > > I've a python code, wh

Re: Python to Scala

2016-06-17 Thread Yash Sharma

m new, but I know it and still learning. But I need help in >>> converting this code to Scala. I've nearly no knowledge in Python, hence, >>> requested the experts here. >>> >>> Hope you get me now. >>> >>> Thanks, >>> Aakash. &g

Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-21 Thread Yash Sharma

How about supplying the jar directly in spark submit - ./bin/spark-submit \ > --class org.apache.spark.examples.SparkPi \ > --master yarn-client \ > --driver-memory 512m \ > --num-executors 2 \ > --executor-memory 512m \ > --executor-cores 2 \ > /user/shihj/spark_lib/spark-examples-1.6.1-hadoop2.6

Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-21 Thread Yash Sharma

gt; at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> get error at once >> -- 原始邮件 -- >> *发件人:* "Yash Sharma";; >> *发送时间:* 2016年6月22日(星期三) 下午2:04 >>

Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-21 Thread Yash Sharma

aster:9000/user/shihj/spark_lib/spark-examples-1.6.1-hadoop2.6.0.jar > shihj@master:~/workspace/hadoop-2.6.4$ > can find the jar on all nodes. > > > -- 原始邮件 -- > *发件人:* "Yash Sharma";; > *发送时间:* 2016年6月22日(星期三) 下午2:18 > *收件人:* "S

Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-21 Thread Yash Sharma

.jar On Wed, Jun 22, 2016 at 4:27 PM, 另一片天 <958943...@qq.com> wrote: > Is it able to run on local mode ? > > what mean?? standalone mode ? > > > -- 原始邮件 ------ > *发件人:* "Yash Sharma";; > *发送时间:* 2016年6月22日(星期三) 下午2:18 > *收件人:* &qu

Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-21 Thread Yash Sharma

org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > > > -- 原始邮件 -- > *发件人:* "Yash Sharma";; > *发送时间:* 2016年6月22日(星期三) 下午2:28 > *收件人:* "另

Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-21 Thread Yash Sharma

spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.mai

Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-21 Thread Yash Sharma

ially jar package），because > them very big，the application will wait for too long，there are good method？？ > so i config that para， but not get the my want to effect。 > > > -- 原始邮件 ------ > *发件人:* "Yash Sharma";; > *发送时间:* 2016年6月22日(星期三) 下午2:34

Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-22 Thread Yash Sharma

with a non-zero exit code 1 > Failing this attempt. Failing the application. > > but command get error > > shihj@master:~/workspace/hadoop-2.6.4$ yarn logs -applicationId > application_1466568126079_0006 > Usage: yarn [options] > > yarn: error: no such option: -a >

Re: Spark cluster tuning recommendation

2016-07-11 Thread Yash Sharma

I would say use the dynamic allocation rather than number of executors. Provide some executor memory which you would like. Deciding the values requires couple of test runs and checking what works best for you. You could try something like - --driver-memory 1G \ --executor-memory 2G \ --executor-c

Re: Fast database with writes per second and horizontal scaling

2016-07-11 Thread Yash Sharma

Spark is more of an execution engine rather than a database. Hive is a data warehouse but I still like treating it as an execution engine. For databases, You could compare HBase and Cassandra as they both have very wide usage and proven performance. We have used Cassandra in the past and were very

Re: Spark SQL: Merge Arrays/Sets

2016-07-11 Thread Yash Sharma

This answers exactly what you are looking for - http://stackoverflow.com/a/34204640/1562474 On Tue, Jul 12, 2016 at 6:40 AM, Pedro Rodriguez wrote: > Is it possible with Spark SQL to merge columns whose types are Arrays or > Sets? > > My use case would be something like this: > > DF types > id:

Re: Error in Spark job

2016-07-12 Thread Yash Sharma

Looks like the write to Aerospike is taking too long. Could you try writing the rdd directly to filesystem, skipping the Aerospike write. foreachPartition at WriteToAerospike.java:47, took 338.345827 s - Thanks, via mobile, excuse brevity. On Jul 12, 2016 8:08 PM, "Saurav Sinha" wrote: > Hi,

Re: Streaming from Kinesis is not getting data in Yarn cluster

2016-07-15 Thread Yash Sharma

I struggled with kinesis for a long time and got all my findings documented at - http://stackoverflow.com/questions/35567440/spark-not-able-to-fetch-events-from-amazon-kinesis Let me know if it helps. Cheers, Yash - Thanks, via mobile, excuse brevity. On Jul 16, 2016 6:05 AM, "dharmendra" wr

Re: Spark SQL overwrite/append for partitioned tables

2016-07-25 Thread Yash Sharma

Based on the behavior of spark [1], Overwrite mode will delete all your data when you try to overwrite a particular partition. What I did- - Use S3 api to delete all partitions - Use spark df to write in Append mode [2] 1. http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-deletes-a

Re: Spark SQL overwrite/append for partitioned tables

2016-07-25 Thread Yash Sharma

Correction - dataDF.write.partitionBy(“year”, “month”, “date”).mode(SaveMode.Append).text(“s3://data/test2/events/”) On Tue, Jul 26, 2016 at 10:59 AM, Yash Sharma wrote: > Based on the behavior of spark [1], Overwrite mode will delete all your > data when you try to overwrite a part

Re: Client session timed out, have not heard from server in

2015-12-22 Thread Yash Sharma

Hi Evan, SPARK-9629 referred to connection issues with zookeeper. Could you check if its working fine in your setup. Also please share other error logs you might be getting. - Thanks, via mobile, excuse brevity. On Dec 22, 2015 5:00 PM, "yaoxiaohua" wrote: > Hi, > > I encounter a

Re: Apache spark certification pass percentage ?

2015-12-22 Thread Yash Sharma

Hi Sri, That would depend on the organization from where you are applying the certification. This place would be more helpful where you can ask about questions and information about using spark and/or contributing to spark. Goodluck - Thanks, via mobile, excuse brevity. On Dec 22, 2015 3:56 PM,

Re: Writing partitioned Avro data to HDFS

2015-12-22 Thread Yash Sharma

Hi Jan, Is the error because a past run of the job has already written to the location? In that case you can add more granularity with 'time' along with year and month. That should give you a distinct path for every run. Let us know if it helps or if i missed anything. Goodluck - Thanks, via mo

Re: Client session timed out, have not heard from server in

2015-12-22 Thread Yash Sharma

>> >> INFO ClientCnxn: Client session timed out, have not heard from server in >> 40015ms for sessionid 0x351c416297a145a, closing socket connection and >> attempting reconnect >> >> Before spark2 master process shut down. >> >> I don’t see any zookeep

Re: java.io.FileNotFoundException(Too many open files) in Spark streaming

2015-12-22 Thread Yash Sharma

Could you share the ulimit for your setup please ? - Thanks, via mobile, excuse brevity. On Dec 22, 2015 6:39 PM, "Priya Ch" wrote: > Jakob, > >Increased the settings like fs.file-max in /etc/sysctl.conf and also > increased user limit in /etc/security/limits.conf. But still see the same >

Re: Writing partitioned Avro data to HDFS

2015-12-22 Thread Yash Sharma

.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:137) > at > com.databricks.spark.avro.package$AvroDataFrameWriter$$anonfun$avro$1.apply(package.scala:37) > at > com.databricks.spark.avro.package$AvroDataFrameWriter$$anonfun$avro$1.apply(package.scala:37) > at > $iwC$$iwC$$iwC

Re: Writing partitioned Avro data to HDFS

2015-12-22 Thread Yash Sharma

t; org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57) > at > org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57) > at > org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:69) > at > org.apache.spark.

Re: [Streaming-Kafka] How to start from topic offset when streamcontext is using checkpoint

2016-01-23 Thread Yash Sharma

Hi Raju, Could you please explain your expected behavior with the DStream. The DStream will have event only from the 'fromOffsets' that you provided in the createDirectStream (which I think is the expected behavior). For the smaller files, you will have to deal with smaller files if you intend to

Re: [Streaming-Kafka] How to start from topic offset when streamcontext is using checkpoint

2016-01-25 Thread Yash Sharma

s. Streaming context is getting prepared from the > checkpoint directory and started consuming from the topic offsets which > were stored in checkpoint directory. > > > On Sat, Jan 23, 2016 at 3:44 PM, Yash Sharma wrote: > >> Hi Raju, >> Could you please explain your expec

Re: Spark job fails as soon as it starts. Driver requested a total number of 168510 executor

2016-09-23 Thread Yash Sharma

gt; be 12 executors for testing and let know the status. > > Get Outlook for Android <https://aka.ms/ghei36> > > > > On Fri, Sep 23, 2016 at 3:13 PM +0530, "Yash Sharma" > wrote: > > Thanks Aditya, appreciate the help. >> >> I had the exact

Re: Spark job fails as soon as it starts. Driver requested a total number of 168510 executor

2016-09-23 Thread Yash Sharma

:27 AM, Yash Sharma wrote: > Have been playing around with configs to crack this. Adding them here > where it would be helpful to others :) > Number of executors and timeout seemed like the core issue. > > {code} > --driver-memory 4G \ > --conf spark.dynamicAllocation.en

Re: Spark job fails as soon as it starts. Driver requested a total number of 168510 executor

2016-09-23 Thread Yash Sharma

emory. This can be around 48 assuming 12 nodes x 4 cores each. You could > start with processing a subset of your data and see if you are able to get > a decent performance. Then gradually increase the maximum # of execs for > dynamic allocation and process the remaining data. > > &

Re: Spark job fails as soon as it starts. Driver requested a total number of 168510 executor

2016-09-24 Thread Yash Sharma

files you are trying to read? Number of > executors are very high > On 24 Sep 2016 10:28, "Yash Sharma" wrote: > >> Have been playing around with configs to crack this. Adding them here >> where it would be helpful to others :) >> Number of executors and timeout se

Re: What is the best way to run a scheduled spark batch job on AWS EC2 ?

2017-04-06 Thread Yash Sharma

Hi Shyla, We could suggest based on what you're trying to do exactly. But with the given information - If you have your spark job ready you could schedule it via any scheduling framework like Airflow or Celery or Cron based on how simple/complex you want your work flow to be. Cheers, Yash On Fr

Re: Error while reading the CSV

2017-04-06 Thread Yash Sharma

Hi Nayan, I use the --packages with the spark shell and the spark submit. Could you please try that and let us know: Command: spark-submit --packages com.databricks:spark-csv_2.11:1.4.0 On Fri, 7 Apr 2017 at 00:39 nayan sharma wrote: > spark version 1.6.2 > scala version 2.10.5 > > On 06-Apr-2

Re: distinct query getting stuck at ShuffleBlockFetcherIterator

2017-04-06 Thread Yash Sharma

Hi Ramesh, Could you share some logs please? pastebin ? dag view ? Did you check for GC pauses if any. On Thu, 6 Apr 2017 at 21:55 Ramesh Krishnan wrote: > I have a use case of distinct on a dataframe. When i run the application > is getting stuck at LINE *ShuffleBlockFetcherIterator: Started 4

Re: Error while reading the CSV

2017-04-07 Thread Yash Sharma

n sharma wrote: > Hi Yash, > I know this will work perfect but here I wanted to read the csv using the > assembly jar file. > > Thanks, > Nayan > > On 07-Apr-2017, at 10:02 AM, Yash Sharma wrote: > > Hi Nayan, > I use the --packages with the spark shell and the spar

Re: Quick one... AWS SDK version?

2017-10-02 Thread Yash Sharma

Hi JG, Here are my cluster configs if it helps. Cheers. EMR: emr-5.8.0 Hadoop distribution: Amazon 2.7.3 AWS sdk: /usr/share/aws/aws-java-sdk/aws-java-sdk-1.11.160.jar Applications: Hive 2.3.0 Spark 2.2.0 Tez 0.8.4 On Tue, 3 Oct 2017 at 12:29 JG Perrin wrote: > Hey Sparkians, > > > > What ve

Re: Unsubscribe

2018-01-18 Thread Yash Sharma

Please send mail to user-unsubscr...@spark.apache.org to unsubscribe. Cheers On Fri., 19 Jan. 2018, 5:28 pm Sbf xyz, wrote: >

Re: Unsubscribe

2018-01-18 Thread Yash Sharma

Please send mail to user-unsubscr...@spark.apache.org to unsubscribe. Cheers On Fri., 19 Jan. 2018, 5:11 pm Anu B Nair, wrote: >

38 matches

Mail list logo