HI,
I have submitted a Spark Job with options jars,class,master as *local* but
i am getting an error as below
*dse spark-submit spark error exception in thread main java.io.ioexception:
Invalid Request Exception(Why you have not logged in)*
*Note: submitting datastax spark node*
please let me kn
To clarify, that is the number of executors requested by the SparkContext
from the cluster manager.
On Tue, Jul 28, 2015 at 5:18 PM, amkcom wrote:
> try sc.getConf.getInt("spark.executor.instances", 1)
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.c
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/FailureSuite.scala
This may help.
On Thu, Jul 30, 2015 at 10:42 PM, bit1...@163.com wrote:
> The following is copied from the paper, is something related with rdd
> lineage. Is there a unit test that covers this sce
The following is copied from the paper, is something related with rdd lineage.
Is there a unit test that covers this scenario(rdd partition lost and recovery)?
Thanks.
If a partition of an RDD is lost, the RDD has enough information about how it
was derived from other RDDs to recompute
just th
I have run into similar excpetions
ERROR DirectKafkaInputDStream: ArrayBuffer(java.net.SocketTimeoutException,
org.apache.spark.SparkException: Couldn't find leader offsets for
Set([AdServe,1]))
and the issue has happened on Kafka Side, where my broker offsets go out of
sync, or do not return l
I do suggest that the non-spark related discussions be taken to a different
this forum as it does not directly contribute to the contents of this user
list.
On Thu, Jul 30, 2015 at 8:52 PM, UMESH CHAUDHARY
wrote:
> Thanks for the valuable suggestion.
> I also started with JAX-RS restful service
Thanks TD and Zhihong for the guide. I will check it
bit1...@163.com
From: Tathagata Das
Date: 2015-07-31 12:27
To: Ted Yu
CC: bit1...@163.com; user
Subject: Re: How RDD lineage works
You have to read the original Spark paper to understand how RDD lineage works.
https://www.cs.berkeley.edu/~
Are you running it on YARN? Might be a known YARN credential expiring
issue.
Hari, any insights?
On Thu, Jul 30, 2015 at 4:04 AM, Guillermo Ortiz
wrote:
> I'm executing a job with Spark Streaming and got this error all times when
> the job has been executing for a while (usually hours of days).
How are you figuring out that the receiver isnt running? What does the
streaming UI say regarding the location of the receivers? Also what version
of Spark are you using?
On Sun, Jul 26, 2015 at 8:47 AM, Ashwin Giridharan
wrote:
> Hi Aviemzur,
>
> As of now the Spark workers just run indefinitel
You have to read the original Spark paper to understand how RDD lineage
works.
https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
On Thu, Jul 30, 2015 at 9:25 PM, Ted Yu wrote:
> Please take a look at:
> core/src/test/scala/org/apache/spark/CheckpointSuite.scala
>
> Cheers
>
> On Thu,
Please take a look at:
core/src/test/scala/org/apache/spark/CheckpointSuite.scala
Cheers
On Thu, Jul 30, 2015 at 7:39 PM, bit1...@163.com wrote:
> Hi,
>
> I don't get a good understanding how RDD lineage works, so I would ask
> whether spark provides a unit test in the code base to illustrate h
Hi,
After I create a table in spark sql and load infile an hdfs file to
it, the file is no longer queryable if I do hadoop fs -ls.
Is this expected?
Thanks,
Ron
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
F
Thanks for the valuable suggestion.
I also started with JAX-RS restful service with Angular.
Since play can help a lot in my scenario, I would prefer it along with
Angular. Is this combination fine over d3?
You might want to run spark-submit with option "--deploy-mode cluster"
On Thursday, July 30, 2015 7:24 PM, Marcelo Vanzin
wrote:
Can you share the part of the code in your script where you create the
SparkContext instance?
On Thu, Jul 30, 2015 at 7:19 PM, fordfarline wrote:
Hi A
Hi,
I don't get a good understanding how RDD lineage works, so I would ask whether
spark provides a unit test in the code base to illustrate how RDD lineage works.
If there is, What's the class name is it?
Thanks!
bit1...@163.com
Can you share the part of the code in your script where you create the
SparkContext instance?
On Thu, Jul 30, 2015 at 7:19 PM, fordfarline wrote:
> Hi All,
>
> I`m having an issue when lanching an app (python) against a stand alone
> cluster, but runs in local, as it doesn't reach the cluster.
>
Hi All,
I`m having an issue when lanching an app (python) against a stand alone
cluster, but runs in local, as it doesn't reach the cluster.
It's the first time i try the cluster, in local works ok.
i made this:
-> /home/user/Spark/spark-1.3.0-bin-hadoop2.4/sbin/start-all.sh # Master and
worke
Umesh,
You can create a web-service in any of the languages supported by Spark and
stream the result from this web-service to your D3-based client using Websocket
or Server-Sent Events.
For example, you can create a webservice using Play. This app will integrate
with Spark streaming in the back
Hello;
I have a simple file of mobile IDFA. they look like
gregconv = ['00013FEE-7561-47F3-95BC-CA18D20BCF78',
'000D9B97-2B54-4B80-AAA1-C1CB42CFBF3A',
'000F9E1F-BC7E-47E1-BF68-C68F6D987B96']
I am trying to make this RDD into a data frame
ConvRecord = Row("IDFA")
gregconvdf = gregconv.map(lambda
Dear Michael, dear all,
distinguishing those records that have a match in mapping from those that
don't is the crucial point.
Record(x : Int, a: String)
Mapping(x: Int, y: Int)
Thus
Record(1, "hello")
Record(2, "bob")
Mapping(2, 5)
yield (2, "bob", 5) on an inner join.
BUT I'm also interested
For the first time it needs to list them. AFter that the list should be
cached by the file stream implementation (as far as I remember).
On Thu, Jul 30, 2015 at 3:55 PM, Brandon White
wrote:
> Is this a known bottle neck for Spark Streaming textFileStream? Does it
> need to list all the current
If anyone's curious, the issue here is that I was using the 1.2.4 connector of
the datastax spark Cassandra connector, rather than the 1.4.0-M1 pre-release.
1.2.4 doesn't fully support data frames, and it's presumably still only
experimental in 1.4.0-M1.
Ben
From: Benjamin Ross
Sent: Thursda
Is this a known bottle neck for Spark Streaming textFileStream? Does it
need to list all the current files in a directory before he gets the new
files? Say I have 500k files in a directory, does it list them all in order
to get the new files?
Yes, and that is indeed the problem. It is trying to process all the data
in Kafka, and therefore taking 60 seconds. You need to set the rate limits
for that.
On Thu, Jul 30, 2015 at 8:51 AM, Cody Koeninger wrote:
> If you don't set it, there is no maximum rate, it will get everything from
> the
Hi,
I am new to using Spark and Parquet files,
Below is what i am trying to do, on Spark-shell,
val df =
sqlContext.parquetFile("/data/LM/Parquet/Segment/pages/part-m-0.gz.parquet")
Have also tried below command,
val
df=sqlContext.read.format("parquet").load("/data/LM/Parquet/Segment/pages/
You can't use these names due to limitations in parquet (and the library it
self with silently generate corrupt files that can't be read, hence the
error we throw).
You can alias a column by df.select(df("old").alias("new")), which is
essential what withColumnRenamed does. Alias in this case mean
Perhaps I'm missing what you are trying to accomplish, but if you'd like to
avoid the null values do an inner join instead of an outer join.
Additionally, I'm confused about how the result of joinedDF.filter(joinedDF(
"y").isNotNull).show still contains null values in the column y. This
doesn't re
https://www.youtube.com/watch?v=JncgoPKklVE
On Thu, Jul 30, 2015 at 1:30 PM, wrote:
>
>
> --
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, ple
This message is for the designated recipient only and may contain privileged,
proprietary, or otherwise confidential information. If you have received it in
error, please notify the sender immediately and delete the original. Any other
use of the e-mail by you
I'm submitting the application this way:
spark-submit test-2.0.5-SNAPSHOT-jar-with-dependencies.jar
I've confirmed that org.apache.spark.sql.cassandra and org.apache.cassandra
classes are in the jar.
Apologies for this relatively newbie question - I'm still new to both spark and
scala.
Thanks,
Looking at:
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-NightlyBuilds
Maven artifacts should be here:
https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-core_2.10/1.5.0-SNAPSHOT/
though the jars were dated July 16th.
FY
Hey all,
I'm running what should be a very straight-forward application of the Cassandra
sql connector, and I'm getting an error:
Exception in thread "main" java.lang.RuntimeException: Failed to load class for
data source: org.apache.spark.sql.cassandra
at scala.sys.package$.error(packag
Hi Ted,
The problem is that I don't know if the build uses the commits happened on
the same day or it is possible that it builds based on Jul 15th commits.
Just a thought, it might be possible to replace "SNAPSHOT" with the git
commit hash in the filename so people will know which one is based on.
Hi,
I have just started learning DF in sparks and encountered the following
error:
I am creating the following :
*case class A(a1:String,a2:String,a3:String)*
*case class B(b1:String,b2:String,b3:String)*
*case class C(key:A,value:Seq[B])*
Now I have to do a DF with struc
("key" :{..},"value":{.
Dear Michael, dear all,
motivation:
object OtherEntities {
case class Record( x:Int, a: String)
case class Mapping( x: Int, y: Int )
val records = Seq( Record(1, "hello"), Record(2, "bob"))
val mappings = Seq( Mapping(2, 5) )
}
Now I want to perform an *left outer join* on records and
Hi all,
Our data has lots of human readable column names (names that include
spaces), is it possible to use these with Parquet and Dataframes?
When I try and write the Dataframe I get the following error:
(I am using PySpark)
`AnalysisException: Attribute name "Name with Space" contains invalid
We don't yet updated nullability information based on predicates as we
don't actually leverage this information in many places yet. Why do you
want to update the schema?
On Thu, Jul 30, 2015 at 11:19 AM, martinibus77
wrote:
> Hi all,
>
> 1. *Columns in dataframes can be nullable and not nullabl
Hi all,
1. *Columns in dataframes can be nullable and not nullable. Having a
nullable column of Doubles, I can use the following Scala code to filter all
"non-null" rows:*
val df = . // some code that creates a DataFrame
df.filter( df("columnname").isNotNull() )
+-+-++
you need to fix your configuration so that the resource manager hostname/URL is
set...that address there is the "listen on any port" path
On 30 Jul 2015, at 10:47, Nirav Patel
mailto:npa...@xactlycorp.com>> wrote:
15/07/29 11:19:26 INFO client.RMProxy: Connecting to ResourceManager at
/0.0.0.0:
Hi, I was running spark application as a query service (much like
spark-shell but within my servlet container provided by spring-boot) with
spark 1.0.2 and standalone mode. Now After upgrading to spark 1.3.1 and
trying to use Yarn instead of standalone cluster things going south for me.
I created u
Hello spark users!
I am having some troubles with the TFIDF in MLlib and was wondering if
anyone can point me to the right direction.
The data ingestion and the initial term frequency count code taken from the
example works fine (I am using the first example from this page:
https://spark.apache.o
Thanks Jorn for the response and for the pointer questions to Hive
optimization tips.
I believe I have done the possible & applicable things to improve hive
query performance including but not limited to - running on TEZ, using
partitioning, bucketing, using explain to make sure partition pruning
Hi Praveen,
In MLLib, the major difference is that RandomForestClassificationModel
makes use of a newer API which utilizes ML pipelines. I can't say for
certain if they will produce the same exact result for a given dataset, but
I believe they should.
Bryan
On Wed, Jul 29, 2015 at 12:14 PM, pra
What is your cluster configuration ( size and resources) ?
If you do not have enough resources, then your executor will not run.
Moreover allocating 8 cores to an executor is too much.
If you have a cluster with four nodes running NodeManagers, each equipped
with 4 cores and 8GB of memory,
then a
See past thread on this topic:
http://search-hadoop.com/m/q3RTt0NZXV1cC6q02
On Thu, Jul 30, 2015 at 9:08 AM, unk1102 wrote:
> Hi I have one Spark job which runs fine locally with less data but when I
> schedule it on YARN to execute I keep on getting the following ERROR and
> slowly all executo
there is a work around.
sc.parallelise(items, items size / 2)
This way each executor will get a batch of 2 items at a time, simulating a
producer-consumer. With /4 it will get 4 items.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/sc-parallelise-to-work-
Hi I have one Spark job which runs fine locally with less data but when I
schedule it on YARN to execute I keep on getting the following ERROR and
slowly all executors gets removed from UI and my job fails
15/07/30 10:18:13 ERROR cluster.YarnScheduler: Lost executor 8 on
myhost1.com: remote Rpc cl
yes,thanks, that sorted out the issue.
On 30/07/15 09:26, Akhil Das wrote:
sc.parallelize takes a second parameter which is the total number of
partitions, are you using that?
Thanks
Best Regards
On Wed, Jul 29, 2015 at 9:27 PM, Kostas Kougios
mailto:kostas.koug...@googlemail.com>>
wrote:
If you don't set it, there is no maximum rate, it will get everything from
the end of the last batch to the maximum available offset
On Thu, Jul 30, 2015 at 10:46 AM, Guillermo Ortiz
wrote:
> The difference is that one recives more data than the others two. I can
> pass thought parameters the to
The difference is that one recives more data than the others two. I can
pass thought parameters the topics, so, I could execute the code trying
with one topic and figure out with one is the topic, although I guess that
it's the topics which gets more data.
Anyway it's pretty weird those delays in
Jerry:
See this recent thread:
http://search-hadoop.com/m/q3RTtRygSv1PEdCr
Cheers
On Thu, Jul 30, 2015 at 8:10 AM, Ted Yu wrote:
> The files were dated 16-Jul-2015
> Looks like nightly build either was not published, or published at a
> different location.
>
> You can download spark-1.5.0-SNAPS
Hi Akhil,
Yes I did try to remove it, and i tried to build again.
However that jar keeps getting recreated, whenever i run ./build/sbt
assembly
Thanks,
Rahul P
On Thu, Jul 30, 2015 at 12:38 AM, Akhil Das
wrote:
> Did you try removing this jar? build/sbt-launch-0.13.7.jar
>
> Thanks
> Best Reg
The files were dated 16-Jul-2015
Looks like nightly build either was not published, or published at a
different location.
You can download spark-1.5.0-SNAPSHOT.tgz and binary-search for the commits
made on Jul 16th.
There may be other ways of determining the latest commit.
Cheers
On Thu, Jul 30,
I register my class with Kyro in spark-defaults.conf as follow
spark.serializer
org.apache.spark.serializer.KryoSerializer
spark.kryo.registrationRequired true
spark.kryo.classesToRegister ltn.analytics.es.EsDoc
But I got the following excep
Hi Spark users and developers,
I wonder which git commit was used to build the latest master-nightly build
found at:
http://people.apache.org/~pwendell/spark-nightly/spark-master-bin/latest/?
I downloaded the build but I couldn't find the information related to it.
Thank you!
Best Regards,
Jerry
If the jobs are running on different topicpartitions, what's different
about them? Is one of them 120x the throughput of the other, for
instance? You should be able to eliminate cluster config as a difference
by running the same topic partition on the different clusters and comparing
the results.
I have three topics with one partition each topic. So each jobs run about
one topics.
2015-07-30 16:20 GMT+02:00 Cody Koeninger :
> Just so I'm clear, the difference in timing you're talking about is this:
>
> 15/07/30 14:33:59 INFO DAGScheduler: Job 24 finished: foreachRDD at
> MetricsSpark.scal
Hi Cody sorry my bad you were right there was a typo in topicSet. When I
corrected typo in topicSet it started working. Thanks a lot.
Regards
On Thu, Jul 30, 2015 at 7:43 PM, Cody Koeninger wrote:
> Can you post the code including the values of kafkaParams and topicSet,
> ideally the relevant o
You can't use checkpoints across code upgrades. That may or may not change
in the future, but for now that's a limitation of spark checkpoints
(regardless of whether you're using Kafka).
Some options:
- Start up the new job on a different cluster, then kill the old job once
it's caught up to whe
I am getting same error. Any resolution on this issue ?
Thank you
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Lost-task-connection-closed-tp21361p24082.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
--
Just so I'm clear, the difference in timing you're talking about is this:
15/07/30 14:33:59 INFO DAGScheduler: Job 24 finished: foreachRDD at
MetricsSpark.scala:67, took 60.391761 s
15/07/30 14:37:35 INFO DAGScheduler: Job 93 finished: foreachRDD at
MetricsSpark.scala:67, took 0.531323 s
Are th
Can you post the code including the values of kafkaParams and topicSet,
ideally the relevant output of kafka-topics.sh --describe as well
On Wed, Jul 29, 2015 at 11:39 PM, Umesh Kacha wrote:
> Hi thanks for the response. Like I already mentioned in the question kafka
> topic is valid and it has
Thanks for information this fixed the issue. Issue was in spark-master
memory when I specify manually 1G for master. it start working
On 30 July 2015 at 14:26, Shao, Saisai wrote:
> You’d better also check the log of nodemanager, sometimes because your
> memory usage exceeds the limit of Yarn c
Herman:
For "Pre-built with user-provided Hadoop", spark-1.4.1-bin-hadoop2.6.tgz,
e.g., uses hadoop-2.6 profile which defines versions of projects Spark
depends on.
Hadoop cluster is used to provide storage (hdfs) and resource management
(YARN).
For the latter, please see:
https://spark.apache.org
I am having the same issue. Have you found any resolution ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Connection-closed-reset-by-peers-error-tp21459p24081.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
--
You can create a custom receiver and then inside it you can write yourown
piece of code to receive data, filter them etc before giving it to spark.
Thanks
Best Regards
On Thu, Jul 30, 2015 at 6:49 PM, Sadaf Khan wrote:
> okay :)
>
> then is there anyway to fetch the tweets specific to my accoun
Hi @rok, thanks I got it
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/help-plz-how-to-use-zipWithIndex-to-each-subset-of-a-RDD-tp24071p24080.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
--
Oh... That was embarrassingly easy!
Thank you that was exactly the understanding of partitions that I needed.
P
On Thu, Jul 30, 2015 at 6:35 AM, Simon Elliston Ball <
si...@simonellistonball.com> wrote:
> You might also want to consider broadcasting the models to ensure you get
> one instance
Hi,
I find rather confusing the documentation about the configuration options.
There are a lot of files that are not too clear on where to modify. For
example, spark-env vs spark-defaults.
I am getting an error with Python versions collision:
File "/root/spark/python/lib/pyspark.zip/pyspark/wo
I read about maxRatePerPartition parameter, I haven't set this parameter.
Could it be the problem?? Although this wouldn't explain why it doesn't
work in one of the clusters.
2015-07-30 14:47 GMT+02:00 Guillermo Ortiz :
> They just share the kafka, the rest of resources are independents. I tried
Did you use an absolute path in $path_to_file? I just tried this with
spark-shell v1.4.1 and it worked for me. If the URL is wrong, you should
see an error message from log4j that it can't find the file. For windows it
would be something like file:/c:/path/to/file, I believe.
Dean Wampler, Ph.D.
A
They just share the kafka, the rest of resources are independents. I tried
to stop one cluster and execute just the cluster isn't working but it
happens the same.
2015-07-30 14:41 GMT+02:00 Guillermo Ortiz :
> I have some problem with the JobScheduler. I have executed same code in
> two cluster.
I have some problem with the JobScheduler. I have executed same code in two
cluster. I read from three topics in Kafka with DirectStream so I have
three tasks.
I have check YARN and there aren't more jobs launched.
The cluster where I have troubles I got this logs:
15/07/30 14:32:58 INFO TaskSet
TwitterUtils.createStream takes a 3rd argument which is a filter, once you
provide these, it will only fetch tweets of such.
Thanks
Best Regards
On Thu, Jul 30, 2015 at 4:19 PM, Sadaf wrote:
> Hi.
> I am writing twitter connector using spark streaming. but it fetched the
> random tweets.
> Is t
Yes I was doing same , if You mean that this is the correct way to do
Then I will verify it once more in my case .
On Thu, Jul 30, 2015 at 1:02 PM, Tathagata Das wrote:
> How is sleep not working? Are you doing
>
> streamingContext.start()
> Thread.sleep(xxx)
> streamingContext.stop()
>
>
i have prepared some interview questions:
http://www.knowbigdata.com/blog/interview-questions-apache-spark-part-1
http://www.knowbigdata.com/blog/interview-questions-apache-spark-part-2
please provide your feedback.
On Wed, Jul 29, 2015, 23:43 Pedro Rodriguez wrote:
> You might look at the edx
I'm executing a job with Spark Streaming and got this error all times when
the job has been executing for a while (usually hours of days).
I have no idea why it's happening.
15/07/30 13:02:14 ERROR LiveListenerBus: Listener EventLoggingListener
threw an exception
java.lang.reflect.InvocationTarge
Hi.
I am writing twitter connector using spark streaming. but it fetched the
random tweets.
Is there any way to receive the tweets of a particular account?
I made an app on twitter and used the credentials as given below.
def managingCredentials(): Option[twitter4j.auth.Authorization]=
{
You might also want to consider broadcasting the models to ensure you get one
instance shared across cores in each machine, otherwise the model will be
serialised to each task and you'll get a copy per executor (roughly core in
this instance)
Simon
Sent from my iPhone
> On 30 Jul 2015, at 10
I saw such example in docs:
--conf
"spark.driver.extraJavaOptions=-Dlog4j.configuration=file://$path_to_file"
but, unfortunately, it does not work for me.
On 30.07.2015 05:12, canan chen wrote:
Yes, that should work. What I mean is is there any option in
spark-submit command that I can specify
I've imported a Json file which has this schema :
sqlContext.read.json("filename").printSchema
root
|-- COL: long (nullable = true)
|-- DATA: array (nullable = true)
||-- element: struct (containsNull = true)
|||-- Crate: string (nullable = true)
You can integrate it with any language (like php) and use ajax calls to
update the charts.
Thanks
Best Regards
On Thu, Jul 30, 2015 at 2:11 PM, UMESH CHAUDHARY
wrote:
> Thanks For the suggestion Akhil!
> I looked at https://github.com/mbostock/d3/wiki/Gallery to know more
> about d3, all exampl
Hi,
I've read about the recent updates about spark-streaming integration with
Kafka (I refer to the new approach without receivers).
In the new approach, metadata are persisted in checkpoint folders on HDFS
so that the SparkStreaming context can be recreated in case of failures.
This means that the
Hi,
Just my two cents. I understand your problem is that your problem is that
you have messages with the same key in two different dstreams. What I would
do would be making a union of all the dstreams with StreamingContext.union
or several calls to DStream.union, and then I would create a pair dst
You’d better also check the log of nodemanager, sometimes because your memory
usage exceeds the limit of Yarn container’s configuration.
I’ve met similar problem before, here is the warning log in nodemanager:
2015-07-07 17:06:07,141 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanag
Hi All,
Can someone throw insights on this ?
On Wed, Jul 29, 2015 at 8:29 AM, Priya Ch
wrote:
>
>
> Hi TD,
>
> Thanks for the info. I have the scenario like this.
>
> I am reading the data from kafka topic. Let's say kafka has 3 partitions
> for the topic. In my streaming application, I woul
Hi.
I want to run Spark, and more specifically the "Pre-build with user-provided
Hadoop" version from the downloads page, but I can't find any documentation
on how to connect the two components together (namely Spark and Hadoop).
I've had some success in settting SPARK_CLASSPATH to my hadoop dist
Thanks For the suggestion Akhil!
I looked at https://github.com/mbostock/d3/wiki/Gallery to know more about
d3, all examples described here are on static data, how we can update our
heat map from updated data, if we store it in Hbase or Mysql. I mean, do we
need to query back and fourth for it.
Is
Spark is computing engine, I don't think it is suitable for spark to do the
input large file transferring.
On Thu, Jul 30, 2015 at 4:18 PM, Akhil Das
wrote:
> You can achieve this with a Network File System, create an NFS on the
> cluster and mount it to your local machine.
>
> Thanks
> Best R
>> 15/07/30 12:13:35 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15:
SIGTERM
AM is killed somehow, may due to preemption. Does it always happen ?
Resource manager log would be helpful.
On Thu, Jul 30, 2015 at 4:17 PM, Jeetendra Gangele
wrote:
> I can't see the application logs here. All the
sc.parallelize takes a second parameter which is the total number of
partitions, are you using that?
Thanks
Best Regards
On Wed, Jul 29, 2015 at 9:27 PM, Kostas Kougios <
kostas.koug...@googlemail.com> wrote:
> Hi, I do an sc.parallelize with a list of 512k items. But sometimes not all
> executo
You can achieve this with a Network File System, create an NFS on the
cluster and mount it to your local machine.
Thanks
Best Regards
On Wed, Jul 29, 2015 at 9:32 AM, Anh Hong
wrote:
> Hi,
>
> I'm using spark-submit cluster mode to submit a job from local to spark
> cluster. There are input f
I can't see the application logs here. All the logs are going into stderr.
can anybody help here?
On 30 July 2015 at 12:21, Jeetendra Gangele wrote:
> I am running below command this is default spark PI program but this is
> not running all the log are going in stderr but at the terminal job is
Like this?
val data = sc.textFile("/sigmoid/audio/data/", 24).foreachPartition(urls =>
speachRecognizer(urls))
Let 24 be the total number of cores that you have on all the workers.
Thanks
Best Regards
On Wed, Jul 29, 2015 at 6:50 AM, Peter Wolf wrote:
> Hello, I am writing a Spark application
You can easily push data to an intermediate storage from spark streaming
(like HBase or a SQL/NoSQL DB etc) and then power your dashboards with d3
js.
Thanks
Best Regards
On Tue, Jul 28, 2015 at 12:18 PM, UMESH CHAUDHARY
wrote:
> I have just started using Spark Streaming and done few POCs. It i
What operation are you doing with streaming? Also can you look in the
datanode logs and see whats going on?
Thanks
Best Regards
On Tue, Jul 28, 2015 at 8:18 AM, guoqing0...@yahoo.com.hk <
guoqing0...@yahoo.com.hk> wrote:
> Hi,
> I got a error when running spark streaming as below .
>
> java.lang
It seem an issue with the ES connector
https://github.com/elastic/elasticsearch-hadoop/issues/482
Thanks
Best Regards
On Tue, Jul 28, 2015 at 6:14 AM, An Tran wrote:
> Hello all,
>
> I am currently having an error with Spark SQL access Elasticsearch using
> Elasticsearch Spark integration. Bel
Hi Ferriad,
Can you share the code? because its hard to judge any problem with this
little information.
Thank you
Regards
Himanshu
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Cannot-Work-On-Next-Interval-tp24045p24075.html
Sent from th
When you give master as *local[1]* it occupies a single thread which will
probably run on a single core, you can give *local[*]* to allocate the
total # cores that you have.
Thanks
Best Regards
On Tue, Jul 28, 2015 at 1:27 AM, wrote:
> Hi all,
>
> would like some insight. I am currently comput
Did you try removing this jar? build/sbt-launch-0.13.7.jar
Thanks
Best Regards
On Tue, Jul 28, 2015 at 12:08 AM, Rahul Palamuttam
wrote:
> Hi All,
>
> I hope this is the right place to post troubleshooting questions.
> I've been following the install instructions and I get the following error
>
1 - 100 of 102 matches
Mail list logo