Hello, I am a spark user. I use the "spark-shell.cmd" startup command in
windows cmd, the first startup is normal, when I use the "ctrl+c" command to
force the end of the spark window, it can't start normally again. .The error
message is as follows &q
Hello, I am a spark user. I use the "spark-shell.cmd" startup command in
windows cmd, the first startup is normal, when I use the "ctrl+c" command to
force the end of the spark window, it can't start normally again. .The error
message is as follows &q
Hi Team , We are facing issue in production where we are getting frequent Still have 1 request outstanding when connection with the hostname was closed connection reset by peer : errors as well as warnings : failed to remove cache rdd or failed to remove broadcast variable. Please help us how to
Hi Team , We are facing issue in production where we are getting frequent Still have 1 request outstanding when connection with the hostname was closed connection reset by peer : errors as well as warnings : failed to remove cache rdd or failed to remove broadcast variable. Please help us how to
://twitter.com/jaceklaskowski
>
> <https://twitter.com/jaceklaskowski>
>
>
> On Tue, Jan 19, 2021 at 11:27 PM Brett Spark
> wrote:
>
>> Hello!
>> When using Spark Standalone & Spark 2.4.4 / 3.0.0 - we are seeing our
>> standalone Spark "applications&qu
Hello!
When using Spark Standalone & Spark 2.4.4 / 3.0.0 - we are seeing our
standalone Spark "applications" timeout and show as "Finished" after around
an hour of time.
Here is a screenshot from the Spark master before it's marked as finished.
[image: image.png]
Hello,
We are considering whether to use Hadoop or Kubernetes as the cluster
manager for Spark. We would prefer to have Hadoop 3 because of its native
support for scheduling GPUs.
Although there is a Spark 3.0.0 pre-view2 version available that is
pre-built for Hadoop 3, I would like to know
uch on Oracle? How many partitions do you have on Oracle side?
>
> Am 06.04.2019 um 16:59 schrieb Lian Jiang :
>
> Hi,
>
> My spark job writes into oracle db using:
>
> df.coalesce(10).write.format("jdbc").option("url", url)
> .option("driver&qu
usamy Thirupathy wrote:
>
> HI Jorn,
>
> Thanks for your sharing different options, yes we are trying to build a
> generic tool for Hive to Spark export.
> FYI, currently we are using sqoop, we are trying to migrate from sqoop to
> spark.
>
> Thanks
> -G
>
> On
-
> ++
> |aver|
> ++
> | 3.0|
> ++
>
> ---
> Batch: 1
> ---
> ++
> |aver|
> ++
> | 4.0|
> ++
>
>
> Updated Code -
>
Hi,
I have a simple Java program to read data from kafka using spark streaming.
When i run it from eclipse on my mac, it is connecting to the zookeeper,
bootstrap nodes,
But its not displaying any data. it does not give any error.
it just shows
18/01/16 20:49:15 INFO Executor: Finished task
Dear Friends,
I am new to spark DataFrame. My requirement is i have a dataframe1 contains
the today's records and dataframe2 contains yesterday's records. I need to
compare the today's records with yesterday's records and find out new
records which are not exists in the yeste
I need it cached to improve throughput ,only hope it can be refreshed once a
day not every batch.
> On Nov 13, 2017, at 4:49 PM, Burak Yavuz wrote:
>
> I think if you don't cache the jdbc table, then it should auto-refresh.
>
> On Mon, Nov 13, 2017 at 1:2
Hi
I’m using struct streaming(spark 2.2) to receive Kafka msg ,it works great.
The thing is I need to join the Kafka message with a relative static table
stored in mysql database (let’s call it metadata here).
So is it possible to reload the metadata table after some time interval(like
How much memory have you allocated to the driver? Driver stores some state
for tracking the task, stage and job history that you can see in the spark
console, it does take up a significant portion of the heap, anywhere from
200MB - 1G, depending no your map reduce steps.
Either way that is a good
hence
completing tasks quicker and let the spark scheduler (which is low cost and
efficient based on my observation, it is never the bottleneck) do the work
of distributing the work among the tasks.
I have experimented with 1 task per core, 2-3 tasks per core and all the
way up to 20+ tasks per core
Spark has more support for scala, by that I mean more APIs are available
for scala compared to python or Java. Also scala code will be more concise
and easy to read. Java is very verbose.
On Thu, Feb 9, 2017 at 10:21 PM, Irving Duran
wrote:
> I would say Java, since it will be somewhat simi
one has solved similar
problem.
Thanks,
Bharath
On Mon, Oct 31, 2016 at 11:40 AM, Spark User
wrote:
> Trying again. Hoping to find some help in figuring out the performance
> bottleneck we are observing.
>
> Thanks,
> Bharath
>
> On Sun, Oct 30, 2016 at 11:58 AM, Spark User
>
Hi, The source file i have is on local machine and its pretty huge like 150
gb. How to go about it?
On Sun, Nov 20, 2016 at 8:52 AM, Steve Loughran
wrote:
>
> On 19 Nov 2016, at 17:21, vr spark wrote:
>
> Hi,
> I am looking for scala or python code samples to covert local ts
Hi All,
It seems like the heap usage for
org.apache.spark.deploy.yarn.ApplicationMaster keeps growing continuously.
The driver crashes with OOM eventually.
More details:
I have a spark streaming app that runs on spark-2.0. The
spark.driver.memory is 10G and spark.yarn.driver.memoryOverhead is
Hi,
I am looking for scala or python code samples to covert local tsv file to
orc file and store on distributed cloud storage(openstack).
So, need these 3 samples. Please suggest.
1. read tsv
2. convert to orc
3. store on distributed cloud storage
thanks
VR
Trying again. Hoping to find some help in figuring out the performance
bottleneck we are observing.
Thanks,
Bharath
On Sun, Oct 30, 2016 at 11:58 AM, Spark User
wrote:
> Hi All,
>
> I have a UDAF that seems to perform poorly when its input is skewed. I
> have been debugg
me goes down to 4 minutes.
So I am trying to understand why is there such a big performance
difference? What in UDAF causes the processing time to increase in orders
of magnitude when there is a skew in the data as observed above?
Any insight from spark developers, contributors, or anyone else who
Hi All,
I'm trying to create a Dataset from RDD and do groupBy on the Dataset. The
groupBy stage runs with 200 partitions. Although the RDD had 5000
partitions. I also seem to have no way to change that 200 partitions on the
Dataset to some other large number. This seems to be affecting the
parall
Hi,
I have a continuous rest api stream which keeps spitting out data in form
of json.
I access the stream using python requests.get(url, stream=True,
headers=headers).
I want to receive them using spark and do further processing. I am not sure
which is best way to receive it in spark.
What are
tr3Counts = ds.groupBy('keyAttr', 'attr3').count()
//similar counts for 20 attributes
//code to merge attr1Counts and attr2Counts and attr3Counts
//translate it to desired output format and save the result.
Some more details:
1) The application is a spark streaming application
Hi Jacek/All,
I restarted my terminal and then i try spark-submit and again getting
those errors. How do i see how many "runtimes" are running and how to have
only one? some how my spark 1.6 and spark 2.0 are conflicting. how to fix
it?
i installed spark 1.6 earlier using this
Hi,
I use scala IDE for eclipse. I usually run job against my local spark
installed on my mac and then export the jars and copy it to spark cluster
of my company and run spark submit on it.
This works fine.
But i want to run the jobs from scala ide directly using the spark cluster
of my company
yes, i have both spark 1.6 and spark 2.0.
I unset the spark home environment variable and pointed spark submit to 2.0.
Its working now.
How do i uninstall/remove spark 1.6 from mac?
Thanks
On Sun, Sep 25, 2016 at 4:28 AM, Jacek Laskowski wrote:
> Hi,
>
> Can you execute run-exampl
Hi,
I have this simple scala app which works fine when i run it as scala
application from the scala IDE for eclipse.
But when i export is as jar and run it from spark-submit i am getting below
error. Please suggest
*bin/spark-submit --class com.x.y.vr.spark.first.SimpleApp test.jar*
16/09/24 23
Hi Ted/All,
i did below to get fullstack and see below, not able to understand root
cause..
except Exception as error:
traceback.print_exc()
and this what i get...
File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/context.py",
line 580, in sql
return Data
pting to accept an unknown
offer b859f2f3-7484-482d-8c0d-35bd91c1ad0a-O168676558.
and many more lines like this on the screen with similar message
On Wed, Aug 17, 2016 at 9:08 AM, Ted Yu wrote:
> Please include user@ in your reply.
>
> Can you reveal the snippet of hive sql
W0816 23:17:01.984846 16360 sched.cpp:1195] Attempting to accept an unknown
offer b859f2f3-7484-482d-8c0d-35bd91c1ad0a-O162910492
W0816 23:17:01.984987 16360 sched.cpp:1195] Attempting to accept an unknown
offer b859f2f3-7484-482d-8c0d-35bd91c1ad0a-O162910493
W0816 23:17:01.985124 16360 sched.cpp
Hi,
I am getting error on below scenario. Please suggest.
i have a virtual view in hive
view name log_data
it has 2 columns
query_map map
parti_date int
Here is my snippet for the spark data frame
my dataframe
res=sqlcont.sql("select parti_date FROM log_data
Hi Experts,
Please suggest
On Thu, Aug 11, 2016 at 7:54 AM, vr spark wrote:
>
> I have data which is json in this format
>
> myList: array
> |||-- elem: struct
> ||||-- nm: string (nullable = true)
> ||||-- vList: a
I have data which is json in this format
myList: array
|||-- elem: struct
||||-- nm: string (nullable = true)
||||-- vList: array (nullable = true)
|||||-- element: string (containsNull = true)
from my kafka stream, i created a dataframe usin
Hi,
I would to know the steps to connect SPARK SQL from spring framework
(Web-UI).
also how to run and deploy the web application?
x27;
On Tue, Jul 26, 2016 at 12:05 PM, Cody Koeninger wrote:
> Have you tried filtering out corrupt records with something along the
> lines of
>
> df.filter(df("_corrupt_record").isNull)
>
> On Tue, Jul 26, 2016 at 1:53 PM, vr spark wrote:
> > i am readi
i am reading data from kafka using spark streaming.
I am reading json and creating dataframe.
I am using pyspark
kvs = KafkaUtils.createDirectStream(ssc, kafkaTopic1, kafkaParams)
lines = kvs.map(lambda x: x[1])
lines.foreachRDD(mReport)
def mReport(clickRDD):
clickDF = sqlContext.jsonRDD
i am reading data from kafka using spark streaming.
I am reading json and creating dataframe.
kvs = KafkaUtils.createDirectStream(ssc, kafkaTopic1, kafkaParams)
lines = kvs.map(lambda x: x[1])
lines.foreachRDD(mReport)
def mReport(clickRDD):
clickDF = sqlContext.jsonRDD(clickRDD
val textFile = sc.textFile("README.md")val linesWithSpark =
textFile.filter(line => line.contains("Spark"))
linesWithSpark.saveAsTextFile("output1")
Same error:
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
file:/home/user/spark-1.5.1-bin-hadoop2.4/bin/README.md
val count = inputfile.flatMap(line => line.split(" ")).map(word =>
(word,1)).reduceByKey(_ + _);
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
scala> val count = inputfile.flatMap(line => line.split((" ").map(word =>
(word,1)).reduceByKey(_ + _) | | You typed two blank lines. Starting a new
command.
I am getting like how to solve this
Regrads,
Ramkrishna KT
I am using Spark version is 1.5.1, I am getting errors in first program of
spark,ie.e., word count. Please help me to solve this
*scala> val inputfile = sc.textFile("input.txt")*
*inputfile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[13] at
textFile at :21*
*scal
Hi All,
I am unable to run Spark Streaming job in my Hadoop Cluster, its behaving
unexpectedly. When i submit a job, it fails by throwing some socket
exception in HDFS, if i run the same job second or third time, it runs for
sometime and stops.
I am confused. Is there any configuration in YARN
experiences.
Thanks,
On Mon, Mar 28, 2016 at 10:40 PM, Spark Newbie
wrote:
> Hi All,
>
> The default value for spark.streaming.blockQueueSize is 10 in
> https://github.com/apache/spark/blob/branch-1.6/streaming/src/main/scala/org/apache/spark/streaming/receiver/BlockGenerator.scala.
> I
Hi All,
The default value for spark.streaming.blockQueueSize is 10 in
https://github.com/apache/spark/blob/branch-1.6/streaming/src/main/scala/org/apache/spark/streaming/receiver/BlockGenerator.scala.
In spark kinesis asl 1.4 the received Kinesis records are stored by calling
addData on line 115
Hi ,
I am able to run spark streaming job in local mode, when i try to run the
same job in my YARN cluster, its throwing errors.
Any help is appreciated in this regard
Here are my Exception logs:
Exception 1:
java.net.SocketTimeoutException: 48 millis timeout while waiting for
channel to
Dear All,
I am facing problem with Spark Twitter Streaming code, When ever twitter4j
throws exception, i am unable to catch that exception. Could anyone help me
catching that exception.
Here is Pseudo Code:
SparkConf sparkConf = new
SparkConf().setMaster("local[2]").setApp
Hi Friends,
Anyone can help me about how to terminate the Spark job in eclipse using
java code?
Thanks
Soniya
Hallo friends,
I need a urgent help.
I am using spark streaming to get the tweets from twitter and loading the
data into HDFS. I want to find out the tweet source whether it is from web
or mobile web or facebook ..etc. could you please help me logic.
Thanks
Soniya
logs and see why the sparkcontext is being
> shutdown? Similar discussion happened here previously.
> http://apache-spark-user-list.1001560.n3.nabble.com/RECEIVED-SIGNAL-15-SIGTERM-td23668.html
>
> Thanks
> Best Regards
>
> On Thu, Jan 21, 2016 at 5:11 PM, Soni spark
>
Hi Friends,
I spark job is successfully running on local mode but failing on
cluster mode. Below is the error message i am getting. anyone can help
me.
16/01/21 16:38:07 INFO twitter4j.TwitterStreamImpl: Establishing connection.
16/01/21 16:38:07 INFO twitter.TwitterReceiver: Twitter receiver
you need make sure this class is accessible to all servers since its a
cluster mode and drive can be on any of the worker nodes.
On Fri, Dec 25, 2015 at 5:57 PM, Saiph Kappa wrote:
> Hi,
>
> I'm submitting a spark job like this:
>
> ~/spark-1.5.2-bin-hadoop2.6/bin/
Scenario1:
val z = sc.parallelize(List("12","23","345",""),2)
z.aggregate("")((x,y) => math.min(x.length, y.length).toString, (x,y) => x
+ y)
res143: String = 10
Scenario2:
val z = sc.parallelize(List("12","23","","345"),2)
z.aggregate("")((x,y) => math.min(x.length, y.length).toString, (x,y) => x
local dirs and Spark recognizes that, so rather than
> re-computing, it will start from the following stage. So, this is a good
> thing in that you’re not re-computing a stage. In your case, it looks like
> there’s already the output of the userreqs RDD (reduceByKey) so it doesn’t
> re
Whats does the below Skipped Stage means. can anyone help in clarifying?
I was expecting 3 stages to get Succeeded but only 2 of them getting
completed while one is skipped.
Status: SUCCEEDED
Completed Stages: 2
Skipped Stages: 1
Scala REPL Code Used:
accounts is a basic RDD contains
Hi friends,
I am trying to create hive table through spark with Java code in Eclipse
using below code.
HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc.sc());
sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)");
but i am getting error
Hi Friends,
I have created a hive external table with partition. I want to alter the
hive table partition through spark with java code.
alter table table1
add if not exists
partition(datetime='2015-12-01')
location 'hdfs://localhost:54310/spark/twitter/datetime=2015-12-01/
Folks,
I have the following program :
SparkConf conf = new
SparkConf().setMaster("local").setAppName("Indexer").set("spark.driver.maxResultSize",
"2g");conf.set("es.index.auto.create", "true");conf.set("es.nodes",
"localhost");conf.set("es.port", "9200");conf.set("es.write.operation",
"index");
Hi Friends,
I am written a spark streaming program in Java to access twitter tweets and
it is working fine. I can able to copy the twitter feeds to HDFS location
by batch wise.For each batch, it is creating a folder with epoch time
stamp. for example,
If i give HDFS location as *hdfs
Pinging again ...
On Wed, Nov 25, 2015 at 4:19 PM, Ted Yu wrote:
> Which Spark release are you using ?
>
> Please take a look at:
> https://issues.apache.org/jira/browse/SPARK-5594
>
> Cheers
>
> On Wed, Nov 25, 2015 at 3:59 PM, Spark Newbie
> wrote:
>
>>
Pinging again to see if anyone has any thoughts or prior experience with
this issue.
On Wed, Nov 25, 2015 at 3:56 PM, Spark Newbie
wrote:
> Hi Spark users,
>
> I have been seeing this issue where receivers enter a "stuck" state after
> it encounters a the following exc
Using Spark-1.4.1
On Wed, Nov 25, 2015 at 4:19 PM, Ted Yu wrote:
> Which Spark release are you using ?
>
> Please take a look at:
> https://issues.apache.org/jira/browse/SPARK-5594
>
> Cheers
>
> On Wed, Nov 25, 2015 at 3:59 PM, Spark Newbie
> wrote:
>
>>
Hi Spark users,
I'm seeing the below exceptions once in a while which causes tasks to fail
(even after retries, so it is a non recoverable exception I think), hence
stage fails and then the job gets aborted.
Exception ---
java.io.IOException: org.apache.spark.SparkException: Failed t
Hi Spark users,
I have been seeing this issue where receivers enter a "stuck" state after
it encounters a the following exception "Error in block pushing thread -
java.util.concurrent.TimeoutException: Futures timed out".
I am running the application on spark-1.4.1 and u
Dear Friends,
I am struggling with spark twitter streaming. I am not getting any data.
Please correct below code if you found any mistakes.
import org.apache.spark.*;
import org.apache.spark.api.java.
function.*;
import org.apache.spark.streaming.*;
import org.apache.spark.streaming.api.java
Are you using EMR?
You can install Hadoop-2.6.0 along with Spark-1.5.1 in your EMR cluster.
And that brings s3a jars to the worker nodes and it becomes available to
your application.
On Thu, Oct 15, 2015 at 11:04 AM, Scott Reynolds
wrote:
> List,
>
> Right now we build our spark jobs
e Spark's configuration page). The job by default does not get
> resubmitted.
>
> You could try getting the logs of the failed executor, to see what caused
> the failure. Could be a memory limit issue, and YARN killing it somehow.
>
>
>
> On Wed, Oct 14, 2015 at 11:05
regardless of whether they were successfully processed or not.
On Wed, Oct 14, 2015 at 11:01 AM, Spark Newbie
wrote:
> I ran 2 different spark 1.5 clusters that have been running for more than
> a day now. I do see jobs getting aborted due to task retry's maxing out
> (default 4) d
I ran 2 different spark 1.5 clusters that have been running for more than a
day now. I do see jobs getting aborted due to task retry's maxing out
(default 4) due to ConnectionException. It seems like the executors die and
get restarted and I was unable to find the root cause (same app cod
Hi Spark users,
I'm seeing the below exception in my spark streaming application. It
happens in the first stage where the kinesis receivers receive records and
perform a flatMap operation on the unioned Dstream. A coalesce step also
happens as a part of that stage for optimizing the perfor
Hi Spark users,
Is there an easy way to turn on DEBUG logs in receivers and executors?
Setting sparkContext.setLogLevel seems to turn on DEBUG level only on the
Driver.
Thanks,
logs? I
can send it if that will help dig into the root cause.
On Fri, Oct 9, 2015 at 2:18 PM, Tathagata Das wrote:
> Can you provide the before stop and after restart log4j logs for this?
>
> On Fri, Oct 9, 2015 at 2:13 PM, Spark Newbie
> wrote:
>
>> Hi Spark Users,
&
Hi Spark Users,
I'm seeing checkpoint restore failures causing the application startup to
fail with the below exception. When I do "ls" on the s3 path I see the key
listed sometimes and not listed sometimes. There are no part files
(checkpointed files) in the specified S3 path. T
Folks,
I have an input file which is gzipped. I use sc.textFile("foo.gz") when I see
the following problem. Can someone help me how to fix this?
15/09/03 10:05:32 INFO deprecation: mapred.job.id is deprecated. Instead, use
mapreduce.job.id15/09/03 10:05:32 INFO CodecPool: Got brand-new decompress
Hi all ,
Can we create data frame from excels sheet or csv file , in below example It
seems they support only json ?
DataFrame df =
sqlContext.read().json("examples/src/main/resources/people.json");
I was running a Spark Job to crunch a 9GB apache log file When I saw the
following error:
15/08/25 04:25:16 WARN scheduler.TaskSetManager: Lost task 99.0 in stage 37.0
(TID 4115, ip-10-150-137-100.ap-southeast-1.compute.internal):
ExecutorLostFailure (executor 29 lost)15/08/25 04:25:16 INFO
Folks,
I use the following Streaming API from KafkaUtils :
public JavaPairInputDStream inputDStream() {
HashSet topicsSet = new
HashSet(Arrays.asList(topics.split(",")));
HashMap kafkaParams = new HashMap();
kafkaParams.put(Tokens.KAFKA_BROKER_LIST_TOKEN.getRealTokenName(), brokers);
Thanks for the reply.
Are Standalone or Mesos the only options? Is there a way to auto relaunch if
driver runs as a Hadoop Yarn Application?
On Wednesday, 19 August 2015 12:49 PM, Todd wrote:
There is an option for the spark-submit (Spark standalone or Mesos with
cluster deploy
Folks,
As I see, the Driver program is a single point of failure. Now, I have seen
ways as to how to make it recover from failures on a restart (using
Checkpointing) but I have not seen anything as to how to restart it
automatically if it crashes.
Will running the Driver as a Hadoop Yarn Applica
Forgot to mention. Here is how I run the program :
./bin/spark-submit --conf "spark.app.master"="local[1]"
~/workspace/spark-python/ApacheLogWebServerAnalysis.py
On Wednesday, 12 August 2015 10:28 AM, Spark Enthusiast
wrote:
I wrote a small python program
I wrote a small python program :
def parseLogs(self):
""" Read and parse log file """
self._logger.debug("Parselogs() start")
self.parsed_logs = (self._sc
.textFile(self._logFile)
.map(self._parseApacheLogLine)
.cac
All examples of Spark Stream programming that I see assume streams of lines
that are then tokenised and acted upon (like the WordCount example).
How do I process Streams that span multiple lines? Are there examples that I
can use?
t;;
String password = "";
String url = "jdbc:hive2://quickstart.cloudera:1/default";
On Friday, July 17, 2015 2:29 AM, Roberto Coluccio
wrote:
Hello community,
I'm currently using Spark 1.3.1 with Hive support for outputting processed data
on an external H
I struggle lots in Scala , almost 10 days n0 improvement , but when i switch to
Java 8 , things are so smooth , and I used Data Frame with Redshift and Hive
all are looking good .if you are very good In Scala the go with Scala otherwise
Java is best fit .
This is just my openion because I am Ja
is DataFrame support nested json to dump directely to data base
For simple json it working fine
{"id":2,"name":"Gerald","email":"gbarn...@zimbio.com","city":"Štoky","country":"Czech
Republic","ip":"92.158.154.75”},
But for nested json it failed to load
root |-- rows: array (nullable = true)
Hi All
To Start new project in Spark , which technology is good .Java8 OR Scala .
I am Java developer , Can i start with Java 8 or I Need to learn Scala .
which one is better technology for quick start any POC project
Thanks
- su
I have Spark 1.4 deployed on AWS EMR but methods of SparkR dataFrame read.df
method cannot load data from aws s3.
1) "read.df" error message
read.df(sqlContext,"s3://some-bucket/some.json","json")
15/07/09 04:07:01 ERROR r.RBackendHandler: loadDF on
org.apache.s
Hi 'I am looking how to load data in redshift .Thanks
On Wednesday, July 8, 2015 12:47 AM, shahab
wrote:
Hi,
I did some experiment with loading data from s3 into spark. I loaded data from
s3 using sc.textFile(). Have a look at the following code snippet:
val csv = sc.tex
Hi Can you help me how to load data from s3 bucket to redshift , if you gave
sample code can you pls send me
Thanks su
Folks,
My Use case is as follows:
My Driver program will be aggregating a bunch of Event Streams and acting on
it. The Action on the aggregated events is configurable and can change
dynamically.
One way I can think of is to run the Spark Driver as a Service where a config
push can be caught via
Hi,
I have to build a system that reacts to a set of events. Each of these events
are separate streams by themselves which are consumed from different Kafka
Topics and hence will have different InputDStreams.
Questions:
Will I be able to do joins across multiple InputDStreams and collate the outp
.jets3t.service.S3ServiceException: S3 HEAD request failed for
'/user%2Fdidi' - ResponseCode=400, ResponseMessage=Bad Request
what does the user has to do here??? i am using key & secret !!!
How can i simply create RDD from text file on S3
Thanks
Didi
--
View this message in
ble
at
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:315)
at
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:305)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:1
Spark is based on Scala and it written in Scala .To debug and fix issue i guess
learning Scala is good for long term ? any advise ?
On Thursday, June 25, 2015 1:26 PM, ayan guha wrote:
I am a python fan so I use python. But what I noticed some features are
typically 1-2 release
Hi All ,
I am new for spark , i just want to know which technology is good/best for
spark learning ?
1) Scala 2) Java 3) Python
I know spark support all 3 languages , but which one is best ?
Thanks su
Again, by Storm, you mean Storm Trident, correct?
On Wednesday, 17 June 2015 10:09 PM, Michael Segel
wrote:
Actually the reverse.
Spark Streaming is really a micro batch system where the smallest window is 1/2
a second (500ms). So for CEP, its not really a good idea.
So in terms
5 11:57 AM, Enno Shioji wrote:
We've evaluated Spark Streaming vs. Storm and ended up sticking with Storm.
Some of the important draw backs are:
Spark has no back pressure (receiver rate limit can alleviate this to a certain
point, but it's far from ideal)There is also no ex
eventUpstream services --->
KAFKA -> event Stream Processor > Complex Event Processor
> Elastic Search.
>From what I understand, Storm will make a very good ESP and Spark Streaming
>will make a good CEP.
But, we are also eva
1 - 100 of 103 matches
Mail list logo