* and edit *Path* Variable to
add *bin* directory of *HADOOP_HOME* (say*C:\hadoop\bin*).
fix this issue in my env
2015-05-21 9:55 GMT+03:00 Akhil Das ak...@sigmoidanalytics.com:
This thread happened a year back, can you please share what issue you are
facing? which version of spark you are using
Can you try commenting the saveAsTextFile and do a simple count()? If its a
broadcast issue, then it would throw up the same error.
On 21 May 2015 14:21, allanjie allanmcgr...@gmail.com wrote:
Sure, the code is very simple. I think u guys can understand from the main
function.
public class
-packages.org/package/dibbhatt/kafka-spark-consumer
Regards,
Dibyendu
On Tue, May 19, 2015 at 9:00 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
On Tue, May 19, 2015 at 8:10 PM, Shushant Arora
shushantaror...@gmail.com wrote:
So for Kafka+spark streaming, Receiver based streaming used
of JavaPairRDD is as expected. It is when we are calling
collect() or toArray() methods, the exception is coming.
Something to do with Text class even though I haven't used it in the
program.
Regards
Tapan
On Tue, May 19, 2015 at 6:26 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Try something
the rate with
spark.streaming.kafka.maxRatePerPartition)
Read more here
https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md
On Wed, May 20, 2015 at 12:36 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
One receiver basically runs on 1 core, so if your single node is having
-files
Regards
Tapan
On Wed, May 20, 2015 at 12:42 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
If you can share the complete code and a sample file, may be i can try to
reproduce it on my end.
Thanks
Best Regards
On Wed, May 20, 2015 at 7:00 AM, Tapan Sharma tapan.sha
This is more like an issue with your HDFS setup, can you check in the
datanode logs? Also try putting a new file in HDFS and see if that works.
Thanks
Best Regards
On Wed, May 20, 2015 at 11:47 AM, allanjie allanmcgr...@gmail.com wrote:
Hi All,
The variable I need to broadcast is just 468
Yes, this is the user group. Feel free to ask your questions in this list.
Thanks
Best Regards
On Wed, May 20, 2015 at 5:58 AM, Ricardo Goncalves da Silva
ricardog.si...@telefonica.com wrote:
Hi
I'm learning spark focused on data and machine learning. Migrating from
SAS.
There is a group
Hi Justin,
Can you try with sbt, may be that will help.
- Install sbt for windows
http://www.scala-sbt.org/0.13/tutorial/Installing-sbt-on-Windows.html
- Create a lib directory in your project directory
- Place these jars in it:
- spark-streaming-twitter_2.10-1.3.1.jar
-
Hi Peer,
If you open the driver UI (running on port 4040) you can see the stages and
the tasks happening inside it. Best way to identify the bottleneck for a
stage is to see if there's any time spending on GC, and how many tasks are
there per stage (it should be a number total # cores to achieve
There were some similar discussion happened on JIRA
https://issues.apache.org/jira/browse/SPARK-3633 may be that will give you
some insights.
Thanks
Best Regards
On Mon, May 18, 2015 at 10:49 PM, zia_kayani zia.kay...@platalytics.com
wrote:
Hi, I'm getting this exception after shifting my code
It will be a single job running at a time by default (you can also
configure the spark.streaming.concurrentJobs to run jobs parallel which is
not recommended to put in production).
Now, your batch duration being 1 sec and processing time being 2 minutes,
if you are using a receiver based
not be started at its desired interval.
And Whats the difference and usage of Receiver vs non-receiver based
streaming. Is there any documentation for that?
On Tue, May 19, 2015 at 1:35 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
It will be a single job running at a time by default (you can also
in result.
Deciding where to save offsets (or not) is up to you. You can checkpoint,
or store them yourself.
On Mon, May 18, 2015 at 12:00 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
I have played a bit with the directStream kafka api. Good work cody.
These are my findings and also can you
specify the number of receiver that you want to spawn for
consuming the messages.
On Tue, May 19, 2015 at 2:38 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
spark.streaming.concurrentJobs takes an integer value, not boolean. If
you set it as 2 then 2 jobs will run parallel. Default value
Try something like:
JavaPairRDDIntWritable, Text output = sc.newAPIHadoopFile(inputDir,
org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.class,
IntWritable.class,
Text.class, new Job().getConfiguration());
With the type of input format that you require.
Thanks
Best
assure you that at least as of Spark Streaming 1.2.0,
as Evo says Spark Streaming DOES crash in “unceremonious way” when the free
RAM available for In Memory Cashed RDDs gets exhausted
*From:* Akhil Das [mailto:ak...@sigmoidanalytics.com]
*Sent:* Monday, May 18, 2015 2:03 PM
*To:* Evo Eftimov
of the condition is:
Loss was due to java.lang.Exception
java.lang.Exception: *Could not compute split, block*
*input-4-1410542878200 not found*
*From:* Evo Eftimov [mailto:evo.efti...@isecc.com]
*Sent:* Monday, May 18, 2015 12:13 PM
*To:* 'Dmitry Goldenberg'; 'Akhil Das'
*Cc:* 'user@spark.apache.org
are processed is order ( and offsets commits in
order ) .. etc ..
So whoever use whichever consumer need to study pros and cons of both
approach before taking a call ..
Regards,
Dibyendu
On Tue, May 12, 2015 at 8:10 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Hi Cody,
I was just
Streaming does “NOT” crash UNCEREMNOUSLY – please maintain
responsible and objective communication and facts
*From:* Akhil Das [mailto:ak...@sigmoidanalytics.com]
*Sent:* Monday, May 18, 2015 2:28 PM
*To:* Evo Eftimov
*Cc:* Dmitry Goldenberg; user@spark.apache.org
*Subject:* Re: Spark
Why not use sparkstreaming to do the computation and dump the result
somewhere in a DB perhaps and take it from there?
Thanks
Best Regards
On Mon, May 18, 2015 at 7:51 PM, juandasgandaras juandasganda...@gmail.com
wrote:
Hello,
I would like to use spark streaming over a REST api to get
Did you try --executor-cores param? While you submit the job, do a ps aux |
grep spark-submit and see the exact command parameters.
Thanks
Best Regards
On Sat, May 16, 2015 at 12:31 PM, xiaohe lan zombiexco...@gmail.com wrote:
Hi,
I have a 5 nodes yarn cluster, I used spark-submit to submit
You can either pull the high level information from your resource manager,
or if you want more control/specific information you can write a script and
pull the resource usage information from the OS. Something like this
I think you can try this way also:
DataFrame df =
sqlContext.load(s3n://ACCESS-KEY:SECRET-KEY@bucket-name/file.avro,
com.databricks.spark.avro);
Thanks
Best Regards
On Sat, May 16, 2015 at 2:02 AM, Mohammad Tariq donta...@gmail.com wrote:
Thanks for the suggestion Steve. I'll try that out.
Why not just trigger your batch job with that event?
If you really need streaming, then you can create a custom receiver and
make the receiver sleep till the event has happened. That will obviously
run your streaming pipelines without having any data to process.
Thanks
Best Regards
On Fri, May
With file timestamp, you can actually see the finding new files logic from
here
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala#L172
Thanks
Best Regards
On Fri, May 15, 2015 at 2:25 AM, Vadim Bichutskiy
With receiver based streaming, you can actually
specify spark.streaming.blockInterval which is the interval at which the
receiver will fetch data from the source. Default value is 200ms and hence
if your batch duration is 1 second, it will produce 5 blocks of data. And
yes, with sparkstreaming
What do you mean by not detected? may be you forgot to trigger some action
on the stream to get it executed. Like:
val list_join_action_stream = ssc.fileStream[LongWritable, Text,
TextInputFormat](gc.input_dir, (t: Path) = true, false).map(_._2.toString)
*list_join_action_stream.count().print()*
Did you happened to have a look at the spark job server?
https://github.com/ooyala/spark-jobserver Someone wrote a python wrapper
https://github.com/wangqiang8511/spark_job_manager around it, give it a
try.
Thanks
Best Regards
On Thu, May 14, 2015 at 11:10 AM, MEETHU MATHEW
Can you share the client code that you used to send the data? May be this
discussion would give you some insights
http://apache-avro.679487.n3.nabble.com/Avro-RPC-Python-to-Java-isn-t-working-for-me-td4027454.html
Thanks
Best Regards
On Thu, May 14, 2015 at 8:44 AM, 鹰 980548...@qq.com wrote:
at 1:04 PM, lisendong lisend...@163.com wrote:
I have action on DStream.
because when I put a text file into the hdfs, it runs normally, but if I
put a lz4 file, it does nothing.
在 2015年5月14日,下午3:32,Akhil Das ak...@sigmoidanalytics.com 写道:
What do you mean by not detected? may be you forgot
Have a look https://spark.apache.org/community.html
Send an email to user-unsubscr...@spark.apache.org
Thanks
Best Regards
On Thu, May 14, 2015 at 1:08 PM, Saurabh Agrawal saurabh.agra...@markit.com
wrote:
How do I unsubscribe from this mailing list please?
Thanks!!
Regards,
:
LzoTextInputFormat where is this class?
what is the maven dependency?
在 2015年5月14日,下午3:40,Akhil Das ak...@sigmoidanalytics.com 写道:
That's because you are using TextInputFormat i think, try
with LzoTextInputFormat like:
val list_join_action_stream = ssc.fileStream[LongWritable, Text
With this lowlevel Kafka API
https://github.com/dibbhatt/kafka-spark-consumer/, you can actually
specify how many receivers that you want to spawn and most of the time it
spawns evenly, usually you can put a sleep just after creating the context
for the executors to connect to the driver and then
Did you happened to have a look at this https://github.com/abashev/vfs-s3
Thanks
Best Regards
On Tue, May 12, 2015 at 11:33 PM, Stephen Carman scar...@coldlight.com
wrote:
We have a small mesos cluster and these slaves need to have a vfs setup on
them so that the slaves can pull down the data
This article http://www.virdata.com/tuning-spark/ gives you a pretty good
start on the Spark streaming side. And this article
https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
is for the kafka, it has nice explanation how message size and
May be you should check where exactly its throwing up permission denied
(possibly trying to write to some directory). Also you can try manually
cloning the git repo to a directory and then try opening that in eclipse.
Thanks
Best Regards
On Tue, May 12, 2015 at 3:46 PM, Chandrashekhar Kotekar
I believe fileStream would pickup the new files (may be you should increase
the batch duration). You can see the implementation details for finding new
files from here
I found two examples Java version
https://github.com/deepakkashyap/Spark-Streaming-with-RabbitMQ-/blob/master/example/Spark_project/CustomReceiver.java,
and Scala version. https://github.com/d1eg0/spark-streaming-toy
Thanks
Best Regards
On Tue, May 12, 2015 at 2:31 AM, dgoldenberg
Mesos has a HA option (of course it includes zookeeper)
Thanks
Best Regards
On Tue, May 12, 2015 at 4:53 PM, James King jakwebin...@gmail.com wrote:
I know that it is possible to use Zookeeper and File System (not for
production use) to achieve HA.
Are there any other options now or in the
Are you using checkpointing/WAL etc? If yes, then it could be blocking on
disk IO.
Thanks
Best Regards
On Mon, May 11, 2015 at 10:33 PM, Seyed Majid Zahedi zah...@cs.duke.edu
wrote:
Hi,
I'm running TwitterPopularTags.scala on a single node.
Everything works fine for a while (about 30min),
Yep, you can try this lowlevel Kafka receiver
https://github.com/dibbhatt/kafka-spark-consumer. Its much more
flexible/reliable than the one comes with Spark.
Thanks
Best Regards
On Tue, May 12, 2015 at 5:15 PM, James King jakwebin...@gmail.com wrote:
What I want is if the driver dies for some
before that, only took it down to change
code.
http://tinypic.com/r/2e4vkht/8
Regarding flexibility, both of the apis available in spark will do what
James needs, as I described.
On Tue, May 12, 2015 at 8:55 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Hi Cody,
If you are so sure
wrote:
Very nice! will try and let you know, thanks.
On Tue, May 12, 2015 at 2:25 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Yep, you can try this lowlevel Kafka receiver
https://github.com/dibbhatt/kafka-spark-consumer. Its much more
flexible/reliable than the one comes with Spark
Try SparkConf.set(spark.akka.extensions,Whatever), underneath i think
spark won't ship properties which don't start with spark.* to the executors.
Thanks
Best Regards
On Mon, May 11, 2015 at 8:33 AM, Terry Hole hujie.ea...@gmail.com wrote:
Hi all,
I'd like to monitor the akka using kamon,
Did you try repartitioning? You might end up with a lot of time spending on
GC though.
Thanks
Best Regards
On Fri, May 8, 2015 at 11:59 PM, Vijay Pawnarkar vijaypawnar...@gmail.com
wrote:
I am using the Spark Cassandra connector to work with a table with 3
million records. Using .where() API
Have a look over here https://storm.apache.org/community.html
Thanks
Best Regards
On Sun, May 10, 2015 at 3:21 PM, anshu shukla anshushuk...@gmail.com
wrote:
http://stackoverflow.com/questions/30149868/generate-events-tuples-using-csv-file-with-timestamps
--
Thanks Regards,
Anshu Shukla
Have a look at this SO
http://stackoverflow.com/questions/24048729/how-to-read-input-from-s3-in-a-spark-streaming-ec2-cluster-application
question,
it has discussion on various ways of accessing S3.
Thanks
Best Regards
On Fri, May 8, 2015 at 1:21 AM, in4maniac sa...@skimlinks.com wrote:
Hi
Whats your usecase and what are you trying to achieve? May be there's a
better way of doing it.
Thanks
Best Regards
On Fri, May 8, 2015 at 10:20 AM, Richard Alex Hofer rho...@andrew.cmu.edu
wrote:
Hi,
I'm working on a project in Spark and am trying to understand what's going
on. Right now to
Since its loading 24 records, it could be that your CSV is corrupted? (may
be the new line char isn't \n, but \r\n if it comes from a windows
environment. You can check this with *cat -v yourcsvfile.csv | more*).
Thanks
Best Regards
On Fri, May 8, 2015 at 11:23 AM, luohui20...@sina.com wrote:
I don't think you can use rawSocketStream since the RSVP is from a web
server and you will have to send a GET request first to initialize the
communication. You are better off writing a custom receiver
https://spark.apache.org/docs/latest/streaming-custom-receivers.html for
your usecase. For a
Looks like the jar you provided has some missing classes. Try this:
scalaVersion := 2.10.4
libraryDependencies ++= Seq(
org.apache.spark %% spark-core % 1.3.0,
org.apache.spark %% spark-sql % 1.3.0 % provided,
org.apache.spark %% spark-mllib % 1.3.0 % provided,
log4j % log4j %
We had a similar issue while working on one of our usecase where we were
processing at a moderate throughput (around 500MB/S). When the processing
time exceeds the batch duration, it started to throw up blocknotfound
exceptions, i made a workaround for that issue and is explained over here
Hi
With Spark streaming (all versions), when my processing delay (around 2-4
seconds) exceeds the batch duration (being 1 second) and on a decent
scale/throughput (consuming around 100MB/s on 1+2 node standalone 15GB, 4
cores each) the job will start to throw block not found exceptions when the
You have an issue with your cluster setup. Can you paste your
conf/spark-env.sh and the conf/slaves files here?
The reason why your job is running fine is because you set the master
inside the job as local[*] which runs in local mode (not in standalone
cluster mode).
Thanks
Best Regards
On
I don't see spark-streaming dependency at com.datastax.spark
http://mvnrepository.com/artifact/com.datastax.spark, but it does has a
kafka-streaming dependency though.
Thanks
Best Regards
On Tue, May 5, 2015 at 12:42 AM, Eric Ho eric...@intel.com wrote:
Can I specify this in my build file ?
Here's a complete example
https://databricks.com/blog/2015/03/20/using-mongodb-with-spark.html
Thanks
Best Regards
On Mon, May 4, 2015 at 12:57 PM, Yasemin Kaya godo...@gmail.com wrote:
Hi!
I am new at Spark and I want to begin Spark with simple wordCount example
in Java. But I want to give
It could be filling up your /tmp directory. You need to set your
spark.local.dir or you can also specify SPARK_WORKER_DIR to another
location which has sufficient space.
Thanks
Best Regards
On Mon, May 4, 2015 at 7:27 PM, shahab shahab.mok...@gmail.com wrote:
Hi,
I am getting No space left
Can you paste the complete stacktrace? It looks like you are having version
incompatibility with hadoop.
Thanks
Best Regards
On Sat, May 2, 2015 at 4:36 PM, drarse drarse.a...@gmail.com wrote:
When I run my program with Spark-Submit everythink are ok. But when I try
run in satandalone mode I
With filestream you can actually pass a filter parameter to avoid loading
up .tmp file/directories.
Also, when you move/rename a file, the file creation date doesn't change
and hence spark won't detect them i believe.
Thanks
Best Regards
On Sat, May 2, 2015 at 9:37 PM, Evo Eftimov
Looks like a version incompatibility, just make sure you have the proper
version of spark. Also look further in the stacktrace what is causing
Futures timed out (it could be a network issue also if the ports aren't
opened properly)
Thanks
Best Regards
On Sat, May 2, 2015 at 12:04 AM,
500GB of data will have nearly 3900 partitions and if you can have nearly
that many number of cores and around 500GB of memory then things will be
lightening fast. :)
Thanks
Best Regards
On Sun, May 3, 2015 at 12:49 PM, sherine ahmed sherine.sha...@hotmail.com
wrote:
I need to use spark to
and block sizes are same, shouldn't we end up with 8k
partitions?
On 4 May 2015 17:49, Akhil Das ak...@sigmoidanalytics.com wrote:
500GB of data will have nearly 3900 partitions and if you can have nearly
that many number of cores and around 500GB of memory then things will be
lightening fast
It used to exit without any problem for me. You can basically check in the
driver UI (that runs on 4040) and see what exactly its doing.
Thanks
Best Regards
On Fri, May 1, 2015 at 6:22 PM, James Carman ja...@carmanconsulting.com
wrote:
In all the examples, it seems that the spark application
It could be.
Thanks
Best Regards
On Fri, May 1, 2015 at 9:11 PM, roy rp...@njit.edu wrote:
Hi,
I have recently enable log4j.rootCategory=WARN, console in spark
configuration. but after that spark.logConf=True has becomes ineffective.
So just want to confirm if this is because
There was a similar discussion over here
http://mail-archives.us.apache.org/mod_mbox/spark-user/201411.mbox/%3ccakz4c0s_cuo90q2jxudvx9wc4fwu033kx3-fjujytxxhr7p...@mail.gmail.com%3E
Thanks
Best Regards
On Fri, May 1, 2015 at 7:12 PM, Todd Nist tsind...@gmail.com wrote:
*Resending as I do not
Infact, sparkConf.set(spark.whateverPropertyYouWant,Value) gets shipped
to the executors.
Thanks
Best Regards
On Fri, May 1, 2015 at 2:55 PM, Michael Ryabtsev mich...@totango.com
wrote:
Hi,
We've had a similar problem, but with log4j properties file.
The only working way we've found, was
Just make sure your are having the same version of spark in your cluster
and the project's build file.
Thanks
Best Regards
On Fri, May 1, 2015 at 2:43 PM, Michael Ryabtsev (Totango)
mich...@totango.com wrote:
Hi everyone,
I have a spark application that works fine on a standalone Spark
-memory 12g --executor-cores 4
12G is the limit imposed by YARN cluster, I cant go beyond this.
ANY suggestions ?
Regards,
Deepak
On Thu, Apr 30, 2015 at 6:48 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com
wrote:
Did not work. Same problem.
On Thu, Apr 30, 2015 at 1:28 PM, Akhil Das ak
This is spark mailing list :/
Yes, you can configure the following in the mapred-site.xml for that:
property
namemapred.tasktracker.map.tasks.maximum/name
value4/value
/property
Thanks
Best Regards
On Tue, Apr 28, 2015 at 11:00 PM, Shushant Arora shushantaror...@gmail.com
wrote:
In
If the data is too huge and is in S3, that'll be a lot of network traffic,
instead, if the data is available in HDFS (with proper replication
available) then it will be faster as most of the time, data will be
available as PROCESS_LOCAL/NODE_LOCAL to the executor.
Thanks
Best Regards
On Wed, Apr
You could try increasing your heap space explicitly. like export
_JAVA_OPTIONS=-Xmx10g, its not the correct approach but try.
Thanks
Best Regards
On Tue, Apr 28, 2015 at 10:35 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
I have a SparkApp that runs completes in 45 mins for 5 files (5*750MB
Does this speed up?
val rdd = sc.parallelize(1 to 100*, 30)*
rdd.count
Thanks
Best Regards
On Wed, Apr 29, 2015 at 1:47 AM, Anshul Singhle ans...@betaglide.com
wrote:
Hi,
I'm running the following code in my cluster (standalone mode) via spark
shell -
val rdd = sc.parallelize(1 to
Have a look at KafkaRDD
https://spark.apache.org/docs/latest/api/java/org/apache/spark/streaming/kafka/KafkaRDD.html
Thanks
Best Regards
On Wed, Apr 29, 2015 at 10:04 AM, dgoldenberg dgoldenberg...@gmail.com
wrote:
Hi,
I'm wondering about the use-case where you're not doing continuous,
This is how i used to do it:
- Login to the ec2 cluster (master)
- Make changes to the spark, and build it.
- Stop the old installation of spark (sbin/stop-all.sh)
- Copy old installation conf/* to modified version's conf/
- Rsync modified version to all slaves
- do sbin/start-all.sh from the
You can replace your clusters(on master and workers) assembly jar with your
custom build assembly jar.
Thanks
Best Regards
On Tue, Apr 28, 2015 at 9:45 PM, Bo Fu b...@uchicago.edu wrote:
Hi all,
I have an issue. I added some timestamps in Spark source code and built it
using:
mvn package
One way you could try would be, Inside the map, you can have a synchronized
thread and you can block the map till the thread finishes up processing.
Thanks
Best Regards
On Wed, Apr 29, 2015 at 9:38 AM, Nastooh Avessta (navesta)
nave...@cisco.com wrote:
Hi
In a multi-node setup, I am
It is possible to access the filename, its a bit tricky though.
val fstream = ssc.fileStream[LongWritable, IntWritable,
SequenceFileInputFormat[LongWritable,
IntWritable]](/home/akhld/input/)
fstream.foreach(x ={
//You can get it with this object.
How about:
JavaPairDStreamLongWritable, Text input =
jssc.fileStream(inputDirectory, LongWritable.class, Text.class,
TextInputFormat.class);
See the complete example over here
Option B would be fine, as in the SO itself the answer says, Since RDD
transformations merely build DAG descriptions without execution, in Option
A by the time you call unpersist, you still only have job descriptions and
not a running execution.
Also note, In Option A, you are not specifying any
There's a similar issue reported over here
https://issues.apache.org/jira/browse/SPARK-6847
Thanks
Best Regards
On Tue, Apr 28, 2015 at 7:35 AM, wyphao.2007 wyphao.2...@163.com wrote:
Hi everyone, I am using val messages =
KafkaUtils.createDirectStream[String, String, StringDecoder,
You need to look more deep into your worker logs, you may find GC error, IO
exceptions etc if you look closely which is triggering the timeout.
Thanks
Best Regards
On Mon, Apr 27, 2015 at 3:18 AM, Deepak Gopalakrishnan dgk...@gmail.com
wrote:
Hello Patrick,
Sure. I've posted this on user as
Isn't it already available on the driver UI (that runs on 4040)?
Thanks
Best Regards
On Mon, Apr 27, 2015 at 9:55 AM, Wenlei Xie wenlei@gmail.com wrote:
Hi,
I am wondering how should we understand the running time of SparkSQL
queries? For example the physical query plan and the running
Like this?
messages.foreachRDD(rdd = {
if(rdd.count() 0) //Do whatever you want.
})
Thanks
Best Regards
On Fri, Apr 24, 2015 at 11:20 PM, Sergio Jiménez Barrio
drarse.a...@gmail.com wrote:
Hi,
I need compare the count of messages recived if is 0 or not, but
messages.count() return a
I also want to add mine :/
Everyone wants to add it seems.
Thanks
Best Regards
On Fri, Apr 24, 2015 at 8:58 PM, madhu phatak phatak@gmail.com wrote:
Hi,
I understand that. The following page
http://spark.apache.org/documentation.html has a external tutorials,blogs
section which points
May be this will give you a good start
https://github.com/apache/spark/pull/2077
Thanks
Best Regards
On Sat, Apr 25, 2015 at 1:29 AM, Giovanni Paolo Gibilisco gibb...@gmail.com
wrote:
Hi,
I would like to know if it is possible to build the DAG before actually
executing the application. My
Make sure you are having =2 core for your streaming application.
Thanks
Best Regards
On Sat, Apr 25, 2015 at 3:02 AM, Yang Lei genia...@gmail.com wrote:
I hit the same issue as if the directory has no files at all when running
the sample examples/src/main/python/streaming/hdfs_wordcount.py
, 2015 at 1:27 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Can you try writing to a different S3 bucket and confirm that?
Thanks
Best Regards
On Thu, Apr 23, 2015 at 12:11 AM, Daniel Mahler dmah...@gmail.com
wrote:
Hi Akhil,
It works fine when outprefix is a hdfs:///localhost/... url
The directory in ZooKeeper to store recovery state (default: /spark).
-Jeff
From: Sean Owen so...@cloudera.com
To: Akhil Das ak...@sigmoidanalytics.com
Cc: Michal Klos michal.klo...@gmail.com, User user@spark.apache.org
Date: Wed, 22 Apr 2015 11:05:46 +0100
Subject: Re: Multiple HA spark
There were some PR's about graphical representation with D3.js, you can
possibly see it on the github. Here's a few of them
https://github.com/apache/spark/pulls?utf8=%E2%9C%93q=d3
Thanks
Best Regards
On Wed, Apr 22, 2015 at 8:08 AM, Punyashloka Biswal punya.bis...@gmail.com
wrote:
Dear
are in that dir. For me the most confusing thing is
that the executor can actually create HiveConf objects but when it cannot
find that when the task deserializer is at work.
On 20 April 2015 at 14:18, Akhil Das ak...@sigmoidanalytics.com wrote:
Can you try sc.addJar(/path/to/your/hive/jar), i
You can enable this flag to run multiple jobs concurrently, It might not be
production ready, but you can give it a try:
sc.set(spark.streaming.concurrentJobs,2)
Refer to TD's answer here
You can simply use a custom inputformat (AccumuloInputFormat) with the
hadoop RDDs (sc.newApiHadoopFile etc) for that, all you need to do is to
pass the jobConfs. Here's pretty clean discussion:
With maven you could like:
mvn -Dhadoop.version=2.3.0 -DskipTests clean package -pl core
Thanks
Best Regards
On Mon, Apr 20, 2015 at 8:10 PM, Shiyao Ma i...@introo.me wrote:
Hi.
My usage is only about the spark core and hdfs, so no spark sql or
mlib or other components invovled.
I saw
It could be a similar issue as
https://issues.apache.org/jira/browse/SPARK-4300
Thanks
Best Regards
On Tue, Apr 21, 2015 at 8:09 AM, donhoff_h 165612...@qq.com wrote:
Hi,
I am studying the RDD Caching function and write a small program to verify
it. I run the program in a Spark1.3.0
I think DStream.transform is the one that you are looking for.
Thanks
Best Regards
On Mon, Apr 20, 2015 at 9:42 PM, Evo Eftimov evo.efti...@isecc.com wrote:
Is the only way to implement a custom partitioning of DStream via the
foreach
approach so to gain access to the actual RDDs comprising
Your spark master should be spark://swetha:7077 :)
Thanks
Best Regards
On Mon, Apr 20, 2015 at 2:44 PM, madhvi madhvi.gu...@orkash.com wrote:
PFA screenshot of my cluster UI
Thanks
On Monday 20 April 2015 02:27 PM, Akhil Das wrote:
Are you seeing your task being submitted to the UI
2015 12:28 PM, Akhil Das wrote:
In your eclipse, while you create your SparkContext, set the master uri
as shown in the web UI's top left corner like: spark://someIPorHost:7077
and it should be fine.
Thanks
Best Regards
On Mon, Apr 20, 2015 at 12:22 PM, madhvi madhvi.gu...@orkash.com
try doing a sc.addJar(path\to\your\postgres\jar)
Thanks
Best Regards
On Mon, Apr 20, 2015 at 12:26 PM, shashanksoni shashankso...@gmail.com
wrote:
I am using spark 1.3 standalone cluster on my local windows and trying to
load data from one of our server. Below is my code -
import os
was suspecting some foul play with
classloaders.
On 20 April 2015 at 12:20, Akhil Das ak...@sigmoidanalytics.com wrote:
Looks like a missing jar, try to print the classpath and make sure the
hive jar is present.
Thanks
Best Regards
On Mon, Apr 20, 2015 at 11:52 AM, Manku Timma manku.tim
601 - 700 of 1386 matches
Mail list logo