Not sure if its related, But I got a similar stack overflow error some time
back while reading files and converting them to parquet.
> Stack trace-
> 16/06/02 02:23:54 INFO YarnAllocator: Driver requested a total number of
> 32769 executor(s).
> 16/06/02 02:23:54 INFO ExecutorAllocationManager:
You could use pyspark to run the python code on spark directly. That will
cut the effort of learning scala.
https://spark.apache.org/docs/0.9.0/python-programming-guide.html
- Thanks, via mobile, excuse brevity.
On Jun 18, 2016 2:34 PM, "Aakash Basu" wrote:
> Hi all,
>
> I've a python code, wh
m new, but I know it and still learning. But I need help in
>>> converting this code to Scala. I've nearly no knowledge in Python, hence,
>>> requested the experts here.
>>>
>>> Hope you get me now.
>>>
>>> Thanks,
>>> Aakash.
&g
How about supplying the jar directly in spark submit -
./bin/spark-submit \
> --class org.apache.spark.examples.SparkPi \
> --master yarn-client \
> --driver-memory 512m \
> --num-executors 2 \
> --executor-memory 512m \
> --executor-cores 2 \
> /user/shihj/spark_lib/spark-examples-1.6.1-hadoop2.6
gt; at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> get error at once
>> -- 原始邮件 --
>> *发件人:* "Yash Sharma";;
>> *发送时间:* 2016年6月22日(星期三) 下午2:04
>>
aster:9000/user/shihj/spark_lib/spark-examples-1.6.1-hadoop2.6.0.jar
> shihj@master:~/workspace/hadoop-2.6.4$
> can find the jar on all nodes.
>
>
> -- 原始邮件 --
> *发件人:* "Yash Sharma";;
> *发送时间:* 2016年6月22日(星期三) 下午2:18
> *收件人:* "S
.jar
On Wed, Jun 22, 2016 at 4:27 PM, 另一片天 <958943...@qq.com> wrote:
> Is it able to run on local mode ?
>
> what mean?? standalone mode ?
>
>
> -- 原始邮件 ------
> *发件人:* "Yash Sharma";;
> *发送时间:* 2016年6月22日(星期三) 下午2:18
> *收件人:* &qu
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
>
> -- 原始邮件 --
> *发件人:* "Yash Sharma";;
> *发送时间:* 2016年6月22日(星期三) 下午2:28
> *收件人:* "另
spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> at org.apache.spark.deploy.SparkSubmit$.mai
ially jar package),because
> them very big,the application will wait for too long,there are good method??
> so i config that para, but not get the my want to effect。
>
>
> -- 原始邮件 ------
> *发件人:* "Yash Sharma";;
> *发送时间:* 2016年6月22日(星期三) 下午2:34
with a non-zero exit code 1
> Failing this attempt. Failing the application.
>
> but command get error
>
> shihj@master:~/workspace/hadoop-2.6.4$ yarn logs -applicationId
> application_1466568126079_0006
> Usage: yarn [options]
>
> yarn: error: no such option: -a
>
I would say use the dynamic allocation rather than number of executors.
Provide some executor memory which you would like.
Deciding the values requires couple of test runs and checking what works
best for you.
You could try something like -
--driver-memory 1G \
--executor-memory 2G \
--executor-c
Spark is more of an execution engine rather than a database. Hive is a data
warehouse but I still like treating it as an execution engine.
For databases, You could compare HBase and Cassandra as they both have very
wide usage and proven performance. We have used Cassandra in the past and
were very
This answers exactly what you are looking for -
http://stackoverflow.com/a/34204640/1562474
On Tue, Jul 12, 2016 at 6:40 AM, Pedro Rodriguez
wrote:
> Is it possible with Spark SQL to merge columns whose types are Arrays or
> Sets?
>
> My use case would be something like this:
>
> DF types
> id:
Looks like the write to Aerospike is taking too long.
Could you try writing the rdd directly to filesystem, skipping the
Aerospike write.
foreachPartition at WriteToAerospike.java:47, took 338.345827 s
- Thanks, via mobile, excuse brevity.
On Jul 12, 2016 8:08 PM, "Saurav Sinha" wrote:
> Hi,
I struggled with kinesis for a long time and got all my findings documented
at -
http://stackoverflow.com/questions/35567440/spark-not-able-to-fetch-events-from-amazon-kinesis
Let me know if it helps.
Cheers,
Yash
- Thanks, via mobile, excuse brevity.
On Jul 16, 2016 6:05 AM, "dharmendra" wr
Based on the behavior of spark [1], Overwrite mode will delete all your
data when you try to overwrite a particular partition.
What I did-
- Use S3 api to delete all partitions
- Use spark df to write in Append mode [2]
1.
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-deletes-a
Correction -
dataDF.write.partitionBy(“year”, “month”,
“date”).mode(SaveMode.Append).text(“s3://data/test2/events/”)
On Tue, Jul 26, 2016 at 10:59 AM, Yash Sharma wrote:
> Based on the behavior of spark [1], Overwrite mode will delete all your
> data when you try to overwrite a part
Hi Evan,
SPARK-9629 referred to connection issues with zookeeper. Could you check
if its working fine in your setup.
Also please share other error logs you might be getting.
- Thanks, via mobile, excuse brevity.
On Dec 22, 2015 5:00 PM, "yaoxiaohua" wrote:
> Hi,
>
> I encounter a
Hi Sri,
That would depend on the organization from where you are applying the
certification.
This place would be more helpful where you can ask about questions and
information about using spark and/or contributing to spark.
Goodluck
- Thanks, via mobile, excuse brevity.
On Dec 22, 2015 3:56 PM,
Hi Jan,
Is the error because a past run of the job has already written to the
location?
In that case you can add more granularity with 'time' along with year and
month. That should give you a distinct path for every run.
Let us know if it helps or if i missed anything.
Goodluck
- Thanks, via mo
>>
>> INFO ClientCnxn: Client session timed out, have not heard from server in
>> 40015ms for sessionid 0x351c416297a145a, closing socket connection and
>> attempting reconnect
>>
>> Before spark2 master process shut down.
>>
>> I don’t see any zookeep
Could you share the ulimit for your setup please ?
- Thanks, via mobile, excuse brevity.
On Dec 22, 2015 6:39 PM, "Priya Ch" wrote:
> Jakob,
>
>Increased the settings like fs.file-max in /etc/sysctl.conf and also
> increased user limit in /etc/security/limits.conf. But still see the same
>
.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:137)
> at
> com.databricks.spark.avro.package$AvroDataFrameWriter$$anonfun$avro$1.apply(package.scala:37)
> at
> com.databricks.spark.avro.package$AvroDataFrameWriter$$anonfun$avro$1.apply(package.scala:37)
> at
> $iwC$$iwC$$iwC
t; org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
> at
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
> at
> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:69)
> at
> org.apache.spark.
Hi Raju,
Could you please explain your expected behavior with the DStream. The
DStream will have event only from the 'fromOffsets' that you provided in
the createDirectStream (which I think is the expected behavior).
For the smaller files, you will have to deal with smaller files if you
intend to
s. Streaming context is getting prepared from the
> checkpoint directory and started consuming from the topic offsets which
> were stored in checkpoint directory.
>
>
> On Sat, Jan 23, 2016 at 3:44 PM, Yash Sharma wrote:
>
>> Hi Raju,
>> Could you please explain your expec
gt; be 12 executors for testing and let know the status.
>
> Get Outlook for Android <https://aka.ms/ghei36>
>
>
>
> On Fri, Sep 23, 2016 at 3:13 PM +0530, "Yash Sharma"
> wrote:
>
> Thanks Aditya, appreciate the help.
>>
>> I had the exact
:27 AM, Yash Sharma wrote:
> Have been playing around with configs to crack this. Adding them here
> where it would be helpful to others :)
> Number of executors and timeout seemed like the core issue.
>
> {code}
> --driver-memory 4G \
> --conf spark.dynamicAllocation.en
emory. This can be around 48 assuming 12 nodes x 4 cores each. You could
> start with processing a subset of your data and see if you are able to get
> a decent performance. Then gradually increase the maximum # of execs for
> dynamic allocation and process the remaining data.
>
>
&
files you are trying to read? Number of
> executors are very high
> On 24 Sep 2016 10:28, "Yash Sharma" wrote:
>
>> Have been playing around with configs to crack this. Adding them here
>> where it would be helpful to others :)
>> Number of executors and timeout se
Hi Shyla,
We could suggest based on what you're trying to do exactly. But with the
given information - If you have your spark job ready you could schedule it
via any scheduling framework like Airflow or Celery or Cron based on how
simple/complex you want your work flow to be.
Cheers,
Yash
On Fr
Hi Nayan,
I use the --packages with the spark shell and the spark submit. Could you
please try that and let us know:
Command:
spark-submit --packages com.databricks:spark-csv_2.11:1.4.0
On Fri, 7 Apr 2017 at 00:39 nayan sharma wrote:
> spark version 1.6.2
> scala version 2.10.5
>
> On 06-Apr-2
Hi Ramesh,
Could you share some logs please? pastebin ? dag view ?
Did you check for GC pauses if any.
On Thu, 6 Apr 2017 at 21:55 Ramesh Krishnan wrote:
> I have a use case of distinct on a dataframe. When i run the application
> is getting stuck at LINE *ShuffleBlockFetcherIterator: Started 4
n sharma wrote:
> Hi Yash,
> I know this will work perfect but here I wanted to read the csv using the
> assembly jar file.
>
> Thanks,
> Nayan
>
> On 07-Apr-2017, at 10:02 AM, Yash Sharma wrote:
>
> Hi Nayan,
> I use the --packages with the spark shell and the spar
Hi JG,
Here are my cluster configs if it helps.
Cheers.
EMR: emr-5.8.0
Hadoop distribution: Amazon 2.7.3
AWS sdk: /usr/share/aws/aws-java-sdk/aws-java-sdk-1.11.160.jar
Applications:
Hive 2.3.0
Spark 2.2.0
Tez 0.8.4
On Tue, 3 Oct 2017 at 12:29 JG Perrin wrote:
> Hey Sparkians,
>
>
>
> What ve
Please send mail to user-unsubscr...@spark.apache.org to unsubscribe.
Cheers
On Fri., 19 Jan. 2018, 5:28 pm Sbf xyz, wrote:
>
Please send mail to user-unsubscr...@spark.apache.org to unsubscribe.
Cheers
On Fri., 19 Jan. 2018, 5:11 pm Anu B Nair, wrote:
>
38 matches
Mail list logo