, too; that's a link to
master.
On Fri, Nov 18, 2022 at 5:50 AM Ramakrishna Rayudu <
ramakrishna560.ray...@gmail.com> wrote:
> Hi Sean,
>
> Can you please let me know what is query spark internally fires for
> getting count on dataframe.
>
> Long count=dataframe.count();
Weird, does Teradata not support LIMIT n? looking at the Spark source code
suggests it won't. The syntax is "SELECT TOP"? I wonder if that's why the
generic query that seems to test existence loses the LIMIT.
But, that "SELECT 1" test seems to be used for MySQL, Postgres, s
Hm, the existence queries even in 2.4.x had LIMIT 1. Are you sure nothing
else is generating or changing those queries?
On Thu, Nov 17, 2022 at 11:20 AM Ramakrishna Rayudu <
ramakrishna560.ray...@gmail.com> wrote:
> We are using spark 2.4.4 version.
> I can see two types of querie
We are using spark 2.4.4 version.
I can see two types of queries in DB logs.
SELECT 1 FROM (INPUT_QUERY) SPARK_GEN_SUB_0
SELECT * FROM (INPUT_QUERY) SPARK_GEN_SUB_0 WHERE 1=0
When we see `SELECT *` which ending up with `Where 1=0` but query starts
with `SELECT 1` there is no where condition
Hm, actually that doesn't look like the queries that Spark uses to test
existence, which will be "SELECT 1 ... LIMIT 1" or "SELECT * ... WHERE 1=0"
depending on the dialect. What version, and are you sure something else is
not sending those queries?
On Thu, Nov 17, 2022 at
his.
>
> <https://stackoverflow.com/>
>
>1.
>
>
> <https://stackoverflow.com/posts/74477662/timeline>
>
> We are connecting Tera data from spark SQL with below API
>
> Dataset jdbcDF = spark.read().jdbc(connectionUrl, tableQuery,
> connectionPropertie
Hi Team,
I am facing one issue. Can you please help me on this.
<https://stackoverflow.com/>
1.
<https://stackoverflow.com/posts/74477662/timeline>
We are connecting Tera data from spark SQL with below API
Dataset jdbcDF = spark.read().jdbc(connectionUrl, tableQuery,
connecti
org.apache.kafka.clients.consumer.internals.AbstractCoordinator:
[Consumer
clientId=consumer-spark-kafka-source-4e7e7f32-19ab-44d5-99f5-59fb5a462af2-594190416-driver-0-1,
groupId=spark-kafka-source-4e7e7f32-19ab-44d5-99f5-59fb5a462af2-594190416-driver-0]
Member
consumer-spark-kafka-source-4e7e7f32-19ab-44d5-99f5-59fb5a462af2-594190416-driver-0-1
but even when i have duplicate column values i still get 1 at the "freq"
column,
Also when i specify the rsd param to be 0 then i get arrayIndexOutOfBounds
kind of error.
Why?
or
> Failed to construct kafka consumer, Failed to load SSL keystore
> dataproc-versa-sase-p12-1.jks of type JKS
>
> Details in stackoverflow -
> https://stackoverflow.com/questions/70964198/gcp-dataproc-failed-to-construct-kafka-consumer-failed-to-load-ssl-keystore-d
>
> From my loc
in the current working directory. The truststore and
keystores are passed onto the Kafka Consumer/Producer. However - i'm
getting an error
Failed to construct kafka consumer, Failed to load SSL keystore
dataproc-versa-sase-p12-1.jks of type JKS
Details in stackoverflow -
https://stackoverflow.com/questions
at my other nodes (8 of them and
> over 45 healthy executors) are idle for over 3 hours.
>I notice in the logs that all tasks are run at "NODE_LOCAL"
>
>I wonder what is causing this and if I can do something to make the idle
> executors also do work. 2 options:
&
er 3 hours.
I notice in the logs that all tasks are run at "NODE_LOCAL"
I wonder what is causing this and if I can do something to make the idle
executors also do work. 2 options:
1)It is just the way it is: at some point in this stage, there are
dependencies of the further tasks.
that means that my other nodes (8 of them and over 45 healthy
executors) are idle for over 3 hours.
I notice in the logs that all tasks are run at "NODE_LOCAL"
I wonder what is causing this and if I can do something to make the idle
executors also do work. 2 options:
1)It is just the way it i
As I suggested, you need to use repartition(1) in place of coalesce(1)
That will give you a single file output without losing parallelization for the
rest of the job.
From: James Yu
Date: Wednesday, February 3, 2021 at 2:19 PM
To: Silvio Fiorito , user
Subject: Re: Poor performance caused
tage boundary"?
>
> Thanks
> --
> *From:* Silvio Fiorito
> *Sent:* Wednesday, February 3, 2021 11:05 AM
> *To:* James Yu ; user
> *Subject:* Re: Poor performance caused by coalesce to 1
>
>
> Coalesce is reducing the parallelization o
icitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.
On Wed, 3 Feb 2021 at 19:08, Sean Owen wrote:
> Probably could also be because that coalesce can cause some upstream
> transformations to also have parallelism o
rito
Sent: Wednesday, February 3, 2021 11:05 AM
To: James Yu ; user
Subject: Re: Poor performance caused by coalesce to 1
Coalesce is reducing the parallelization of your last stage, in your case to 1
task. So, it’s natural it will give poor performance especially with large
data. If you absol
Probably could also be because that coalesce can cause some upstream
transformations to also have parallelism of 1. I think (?) an OK solution
is to cache the result, then coalesce and write. Or combine the files after
the fact. or do what Silvio said.
On Wed, Feb 3, 2021 at 12:55 PM James Yu
w to improve it:
>
>We have a particular dataset which we aggregate from other datasets and
>like to write out to one single file (because it is small enough). We
>found that after a series of transformations (GROUP BYs, FLATMAPs), we
>coalesced the final RDD to 1 part
Coalesce is reducing the parallelization of your last stage, in your case to 1
task. So, it’s natural it will give poor performance especially with large
data. If you absolutely need a single file output, you can instead add a stage
boundary and use repartition(1). This will give your query
of transformations (GROUP BYs, FLATMAPs), we coalesced the final RDD
to 1 partition before writing it out, and this coalesce degrade the
performance, not that this additional coalesce operation took additional
runtime, but it somehow dictates the partitions to use in the upstream
transformations.
We hope
Thanks Jungtaek Lim-2 for replying.
May i knw the reference of the API version for sink for both types (DSv1 and
DSv2) in code ?
Where could i see it ? Under what module of spark code ?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
t; Using structured spark streaming and sink the data into ElasticSearch.
> In the stats emit for each batch the "numOutputRows" showing -1 for
> ElasticSearch sink always
> whereas when i see other sinks like Kafka it shows either 0 or some values
> when it emit data.
&g
Hi,
Using structured spark streaming and sink the data into ElasticSearch.
In the stats emit for each batch the "numOutputRows" showing -1 for
ElasticSearch sink always
whereas when i see other sinks like Kafka it shows either 0 or some values
when it emit data.
What could be
Hi,
Using structured spark streaming and sink the data into ElasticSearch.
In the stats emit for each batch the "numOutputRows" showing -1 for
ElasticSearch sink always
whereas when i see other sinks like Kafka it shows either 0 or some values
when it emit data.
What could be
here a
> reason you chose to start reading again from the beginning by using a new
> consumer group rather then sticking to the same consumer group?
>
> In your application, are you manually committing offsets to Kafka?
>
> Regards,
>
> Waleed
>
> On Wed, Apr 1, 202
our application, are you manually committing offsets to Kafka?
Regards,
Waleed
On Wed, Apr 1, 2020 at 1:31 AM Hrishikesh Mishra
wrote:
> Hi
>
> Our Spark streaming job was working fine as expected (the number of events
> to process in a batch). But due to some reasons, we added compaction o
is 1M records and consumer has huge lag.
Driver log which fetches 1 message per partition.
20/03/31 18:25:55 INFO Fetcher: [groupId=pc-nfr-loop-31-march-2020-4]
Resetting offset for partition demandIngestion.SLTarget-45 to offset 211951.
20/03/31 18:26:00 INFO Fetcher: [groupId=pc-nfr-loop-31-march
Hi community,
I'm having this error in some kafka streams:
Caused by: java.io.FileNotFoundException: File
file:/efs/.../kafka/checkpoint/state/0/0/1.delta does not exist
Because of this I have some streams down. How can I fix this?
Thank you.
--
Miguel Silvestre
contact the sender immediately upon receipt.
KTech Services Ltd is registered in England as company number 10704940.
Registered Office: The River Building, 1 Cousin Lane, London EC4R 3TE, United
Kingdom
Hi,
A few thoughts to add to Nicholas' apt reply.
We were loading multiple files from AWS S3 in our Spark application. When
the spark step of load files is called, the driver spends significant time
fetching the exact path of files from AWS s3.
Especially because we specified S3 paths like regex
am using spark 2.2
> I have enabled spark dynamic allocation with executor cores 4, driver
> cores 4 and executor memory 12GB driver memory 10GB.
>
> In Spark UI, I see only 1 task is launched per executor.
>
> Could anyone please help on this?
>
> Kind Regards,
> Sachit Murarka
>
Hi All,
I am using spark 2.2
I have enabled spark dynamic allocation with executor cores 4, driver cores
4 and executor memory 12GB driver memory 10GB.
In Spark UI, I see only 1 task is launched per executor.
Could anyone please help on this?
Kind Regards,
Sachit Murarka
One potential case that can cause this is the optimizer being a little
overzealous with determining if a table can be broadcasted or not. Have you
checked the UI or query plan to see if any steps include a
BroadcastHashJoin? Its possible that the optimizer thinks that it should be
able to fit the
Hi,
We have a quite long winded Spark application we inherited with many stages.
When we run on our spark cluster, things start off well enough. Workers are
busy, lots of progress made, etc. etc. However, 30 minutes into processing, we
see CPU usage of the workers drop drastically. At this
.
(Springer LNCS Proceedings)
Date: June 20, 2019
Workshop URL: http://vhpc.org
Paper Submission Deadline: May 1, 2019 (extended)
Springer LNCS, rolling abstract submission
Abstract/Paper Submission Link: https://edas.info
Forgot to add the link
https://jira.apache.org/jira/browse/KAFKA-5649
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Could you please give some feedback.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Actually, our job runs fine for 17-18 hours and this behavior just suddenly
starts happening after that.
We found the following ticket which is exactly what is happening in our
Kafka cluster also.
WARN Failed to send SSL Close message
(org.apache.kafka.common.network.SslTransportLayer)
You
will end
up having a lot of scheduling delay.
Maybe see, why does it take 1 min to process 100 records and fix the logic.
Also, I see you have higher number of events which takes some time lower
amount of processing time. Fix the code logic and this should be fixed.
Thanks & Regards
Biplob Bi
We are facing an issue with very long scheduling delays in Spark (upto 1+
hours).
We are using Spark-standalone. The data is being pulled from Kafka.
Any help would be much appreciated.
I have attached the screenshots.
<http://apache-spark-user-list.1001560.n3.nabble.com/file/t8018/1-stats.
1
Hi all,
My company just now approved for some of us to go to Spark Summit in SF
this year. Unfortunately, the day long workshops on Monday are sold out
now. We are considering what we might do instead.
Have others done the 1/2 day certification course before? Is it worth
considering? Does
I don't think it is possible to have less than 1 core for AM, this is due
to yarn not spark.
The number of AM comparing to the number of executors should be small and
acceptable. If you do want to save more resources, I would suggest you to
use yarn cluster mode where driver and AM run
status from YARN. Is that correct?
spark.yarn.am.cores is 1, and that AM gets one full vCore on the cluster.
Because I am using DominantResourceCalculator to take vCores into account for
scheduling, this results in a lot of unused CPU capacity overall because all
those AMs each block one full vCore
Hi Gerard,
"If your actual source is Kafka, the original solution of using
`spark.streams.awaitAnyTermination` should solve the problem."
I tried literally everything, nothing worked out.
1) Tried NC from two different ports for two diff streams, still nothing
worked.
2) Tried
; *Date: *Friday, April 13, 2018 at 11:49 PM
>> *To: *Aakash Basu <aakash.spark@gmail.com>
>> *Cc: *Panagiotis Garefalakis <panga...@gmail.com>, user <
>> user@spark.apache.org>
>> *Subject: *Re: [Structured Streaming] More than 1 streaming in a code
&g
panga...@gmail.com>, user <user@spark.apache.org>
Subject: Re: [Structured Streaming] More than 1 streaming in a code
If I use timestamp based windowing, then my average will not be global average
but grouped by timestamp, which is not my requirement. I want to recalculate
the avg of enti
...@capitalone.com>
> *Cc: *spark receiver <spark.recei...@gmail.com>, Panagiotis Garefalakis <
> panga...@gmail.com>, user <user@spark.apache.org>
>
> *Subject: *Re: [Structured Streaming] More than 1 streaming in a code
>
>
>
> Hey Jayesh and Others,
ark@gmail.com>
Date: Monday, April 16, 2018 at 4:52 AM
To: "Lalwani, Jayesh" <jayesh.lalw...@capitalone.com>
Cc: spark receiver <spark.recei...@gmail.com>, Panagiotis Garefalakis
<panga...@gmail.com>, user <user@spark.apache.org>
Subject: Re: [Structured St
t; *To: *Aakash Basu <aakash.spark@gmail.com>
> *Cc: *Panagiotis Garefalakis <panga...@gmail.com>, user <
> user@spark.apache.org>
> *Subject: *Re: [Structured Streaming] More than 1 streaming in a code
>
>
>
> Hi Panagiotis ,
>
>
>
> Wonder
Friday, April 13, 2018 at 11:49 PM
To: Aakash Basu <aakash.spark@gmail.com>
Cc: Panagiotis Garefalakis <panga...@gmail.com>, user <user@spark.apache.org>
Subject: Re: [Structured Streaming] More than 1 streaming in a code
Hi Panagiotis ,
Wondering you solved the problem or
: 0
> ---
> ++
> |aver|
> ++
> | 3.0|
> ++
>
> ---
> Batch: 1
> ---
> ++
> |aver|
> ++
> | 4.0|
> ++
>
---
++
|aver|
++
| 3.0|
++
---
Batch: 1
---
++
|aver|
++
| 4.0|
++
*Updated Code -*
from pyspark.sql import SparkSession
from pyspark.sql.functions import split
spark = SparkSession \
.builder
please clarify the doubt?
>
> -- Forwarded message --
> From: Aakash Basu <aakash.spark@gmail.com>
> Date: Thu, Apr 5, 2018 at 3:18 PM
> Subject: [Structured Streaming] More than 1 streaming in a code
> To: user <user@spark.apache.org>
>
>
> Hi,
&g
Any help?
Need urgent help. Someone please clarify the doubt?
-- Forwarded message --
From: Aakash Basu <aakash.spark@gmail.com>
Date: Thu, Apr 5, 2018 at 3:18 PM
Subject: [Structured Streaming] More than 1 streaming in a code
To: user <user@spark.apache.org>
Hi
servers", "localhost:9092") \
.option("subscribe", "test1") \
.load()
ID = data.select('value') \
.withColumn('value', data.value.cast("string")) \
.withColumn("Col1", split(col("value"
You are correct.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
;
> Hello list!
>
> I am trying to familiarize with Apache Spark. I would like to ask
> something about partitioning and executors.
>
> Can I have e.g: 500 partitions but launch only one executor that will run
> operations in only 1 partition of the 500? And the
Hello list!
I am trying to familiarize with Apache Spark. I would like to ask something
about partitioning and executors.
Can I have e.g: 500 partitions but launch only one executor that will run
operations in only 1 partition of the 500? And then I would like my job to die.
Is there any
Hi Ismael,
>
> It depends on what you mean by “support”. In general, there won’t be new
> feature releases for 1.X (e.g. Spark 1.7) because all the new features are
> being added to the master branch. However, there is always room for bug fix
> releases if there is a catastrophic bug, and c
Hi Ismael,
It depends on what you mean by “support”. In general, there won’t be new
feature releases for 1.X (e.g. Spark 1.7) because all the new features are
being added to the master branch. However, there is always room for bug fix
releases if there is a catastrophic bug, and committers can
Hello,
I noticed that some of the (Big Data / Cloud Managed) Hadoop
distributions are starting to (phase out / deprecate) Spark 1.x and I
was wondering if the Spark community has already decided when will it
end the support for Spark 1.x. I ask this also considering that the
latest release
We have hit a bug with GraphX when calling the connectedComponents function,
where it errors with the following error
java.lang.ArrayIndexOutOfBoundsException: -1
I've found this bug report: https://issues.apache.org/jira/browse/SPARK-5480
Has anyone else hit this issue and if so did how did you
Looking at this
<https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala#L541>
code
for (i <- (L - 2) to (0, -1)) {
layerModels(i + 1).computePrevDelta(deltas(i + 1), outputs(i + 1),
deltas(i))
}
I want to understand why are we passin
status: 1)
17/06/22 15:18:44 INFO yarn.YarnAllocator: Container marked as failed:
container_1498115278902_0001_02_13. Exit status: 1. Diagnostics: Exception
from container-launch.
Container id: container_1498115278902_0001_02_13
Exit code: 1
Stack trace: ExitCodeException exitCode=1
---
From: "Sean Owen"<so...@cloudera.com>
Date: 2017/6/15 16:13:11
To:
"user"<user@spark.apache.org>;"dev"<d...@spark.apache.org>;"??"<1427357...@qq.com>;
Subject: Re: the dependence length of RDD, can its size be greater
Yes. Imagine an RDD that results from a union of other RDDs.
On Thu, Jun 15, 2017, 09:11 萝卜丝炒饭 <1427357...@qq.com> wrote:
> Hi all,
>
> The RDD code keeps a member as below:
> dependencies_ : seq[Dependency[_]]
>
> It is a seq, that means it can keep more than one dependency.
>
> I have an issue
A join?
On Thu, Jun 15, 2017 at 1:11 AM 萝卜丝炒饭 <1427357...@qq.com> wrote:
> Hi all,
>
> The RDD code keeps a member as below:
> dependencies_ : seq[Dependency[_]]
>
> It is a seq, that means it can keep more than one dependency.
>
> I have an issue about this.
Hi all,
The RDD code keeps a member as below:
dependencies_ : seq[Dependency[_]]
It is a seq, that means it can keep more than one dependency.
I have an issue about this.
Is it possible that its size is greater than one please?
If yes, how to produce it please? Would you like show me some
Hi,
What happens if I dont specify checkpointing on a DStream that has
reduceByKeyAndWindow with no inverse function? Would it cause the memory to
be overflown? My window sizes are 1 hour and 24 hours.
I cannot provide an inserse function for this as it is based on HyperLogLog.
My code looks
-c/photo.jpg]<http://awcoleman.blogspot.com/2014/07/processing-asn1-call-detail-records.html>
Processing ASN.1 Call Detail Records with Hadoop (using
...<http://awcoleman.blogspot.com/2014/07/processing-asn1-call-detail-records.html>
awcoleman.blogspot.com
Processing ASN.1 Call De
I would also be interested...
2017-04-06 11:09 GMT+02:00 Hamza HACHANI <hamza.hach...@supcom.tn>:
> Does any body have a spark code example where he is reading ASN.1 files ?
> Thx
>
> Best regards
> Hamza
>
Does any body have a spark code example where he is reading ASN.1 files ?
Thx
Best regards
Hamza
I may have found my problem. We have a scala wrapper on top of spark-submit
to run the shell command through scala.
We were kind of eating the exit code from spark-submit in that wrapper.
When I looked at what the actual exit code was stripping away the wrapper I
got 1.
So I think spark-submit
Hi,
➜ spark git:(master) ✗ ./bin/spark-submit whatever || echo $?
Error: Cannot load main class from JAR file:/Users/jacek/dev/oss/spark/whatever
Run with --help for usage help or --verbose for debug output
1
I see 1 and there are other cases for 1 too.
Pozdrawiam,
Jacek Laskowski
https
Hello,
+1, i have exactly the same issue. I need the exit code to make a decision
on oozie executing actions. Spark-submit always returns 0 when catching the
exception. From spark 1.5 to 1.6.x, i still have the same issue... It would
be great to fix it or to know if there is some work around
Hi,
An interesting case. You don't use Spark resources whatsoever.
Creating a SparkConf does not use YARN...yet. I think any run mode
would have the same effect. So, although spark-submit could have
returned exit code 1, the use case touches Spark very little.
What version is that? Do you see
println("all done!")
} catch {
case e: RuntimeException => {
println("There is an exception in the script exiting with status 1")
System.exit(1)
}
}
}
When I run this code using spark-submit I am expecting to get an exit code
of 1,
however I keep gett
vide the results yourself.
> I don't think it will be back-ported because the the behavior was intended
> in 1.x, just wrongly documented, and we don't want to change the behavior
> in 1.x. The results are still correctly ordered anyway.
>
> On Thu, Dec 29, 2016 at 10:11 PM Manish
e back-ported because the the behavior was intended
> in 1.x, just wrongly documented, and we don't want to change the behavior
> in 1.x. The results are still correctly ordered anyway.
>
> On Thu, Dec 29, 2016 at 10:11 PM Manish Tripathi <tr.man...@gmail.com>
> wrote:
>
>&
was intended
in 1.x, just wrongly documented, and we don't want to change the behavior
in 1.x. The results are still correctly ordered anyway.
On Thu, Dec 29, 2016 at 10:11 PM Manish Tripathi <tr.man...@gmail.com>
wrote:
> Sean,
>
> Thanks for answer. I am using Spark 1.6 so are you saying
(A,B)/norm(A)*norm(B). since norm=1 it is just dot(A,B). If we
don't normalize it would have a norm in the denominator so output is same.
But I understand you are saying in Spark 1.x, one vector was not
normalized. If that is the case then it makes sense.
Any idea how to fix this (get the right
word2vec algorithm of spark to compute documents vector of a text.
>
> I then used the findSynonyms function of the model object to get synonyms
> of few words.
>
> I see something like this:
>
>
>
>
> I do not understand why the cosine similarity is being calculated as
I used a word2vec algorithm of spark to compute documents vector of a text.
I then used the findSynonyms function of the model object to get synonyms
of few words.
I see something like this:
I do not understand why the cosine similarity is being calculated as more
than 1. Cosine similarity
:35,276 (dag-scheduler-event-loop) DEBUG [o.a.s.s.TaskSetManager] -
Valid locality levels for TaskSet 1.0: PROCESS_LOCAL, NODE_LOCAL,
RACK_LOCAL, ANY
22:32:35,288 (dispatcher-event-loop-20) INFO [o.a.s.s.TaskSetManager] -
Starting task 1.0 in stage 1.0 (TID 37, localhost, partition 1,
PROCESS_LOCAL
uot;, "I heard", "heard
about", "about Spark")
Currently if I want to do it I will have to manually transform column first
using current ngram implementation then join 1-gram tokens to each column
value. basically I have to do this outside of pipeline.
--
I get a few warnings like this in Spark 2.0.1 when using org
.apache.spark.mllib.recommendation.ALS:
WARN org.apache.spark.executor.Executor - 1 block locks were not released
by TID = 1448:
[rdd_239_0]
What can be the reason for that?
--
[image: MagineTV]
*Mikael Ståldal*
Senior software
on sfa.snum = sf1.snum " +
" join ann at on at.anum = sfa.anum AND at.atypenum = 11 " +
" join data dr on r.rnum = dr.rnum " +
" join cit cd on dr.dnum = cd.dnum " +
" join cit on cd.cnum = ci.cnum " +
y by default yarn does not
> honor cpu cores as resource, so you will always see vcore is 1 no matter
> what number of cores you set in spark.
>
> On Wed, Aug 3, 2016 at 12:11 PM, satyajit vegesna
> <satyajit.apas...@gmail.com> wrote:
>>
>> Hi All,
>>
>>
y by default yarn does not
> honor cpu cores as resource, so you will always see vcore is 1 no matter
> what number of cores you set in spark.
>
> On Wed, Aug 3, 2016 at 12:11 PM, satyajit vegesna
> <satyajit.apas...@gmail.com> wrote:
>>
>> Hi All,
>>
>> I am t
Use dominant resource calculator instead of default resource calculator
will get the expected vcores as you wanted. Basically by default yarn does
not honor cpu cores as resource, so you will always see vcore is 1 no
matter what number of cores you set in spark.
On Wed, Aug 3, 2016 at 12:11 PM
Hi All,
I am trying to run a spark job using yarn, and i specify --executor-cores
value as 20.
But when i go check the "nodes of the cluster" page in
http://hostname:8088/cluster/nodes then i see 4 containers getting created
on each of the node in cluster.
But can only see 1 vco
am concerned that
this will reduce concurrency
Thanks
Andy
From: Ted Yu <yuzhih...@gmail.com>
Date: Friday, July 22, 2016 at 2:54 PM
To: Andrew Davidson <a...@santacruzintegration.com>
Cc: "user @spark" <user@spark.apache.org>
Subject: Re: Exception in thr
constituentDFS = getDataFrames(constituentDataSets)
>
> results = ["{} {}".format(name, constituentDFS[name].count()) for name
> in constituentDFS]
>
> print(results)
>
> return results
>
>
> %timeit -n 1 -r 1 results = work()
>
>
> in (.0)
node has 6G.
Any suggestions would be greatly appreciated
Andy
def work():
constituentDFS = getDataFrames(constituentDataSets)
results = ["{} {}".format(name, constituentDFS[name].count()) for name
in constituentDFS]
print(results)
return results
%timeit -n 1 -r
Did you check this:
case class Example(name : String, age ; Int)
there is a semicolon. should have been (age : Int)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Task-not-serializable-java-io-NotSerializableException-org-json4s-Serialization-anon-1
r-list.1001560.n3.nabble.com/Task-not-serializable-java-io-NotSerializableException-org-json4s-Serialization-anon-1-tp8233p27359.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe e-mail
Metrics
> import org.apache.spark.{SparkConf, SparkContext}
>
> /**
> * Created by sneha.shukla on 17/06/16.
> */
>
> object TestCode {
>
> def main(args: Array[String]): Unit = {
>
> val sparkConf = new
> SparkConf().setAppName("HBaseRead
1 - 100 of 313 matches
Mail list logo