from:"sivarani"

Re: Standalone mode connection failure from worker node to master

2015-07-14 Thread sivarani

I am also facing the same issue, anyone figured it? Please help



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Standalone-mode-connection-failure-from-worker-node-to-master-tp23101p23816.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: java.lang.IllegalStateException: unread block data

2014-12-17 Thread sivarani

same issue anyone help please



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-IllegalStateException-unread-block-data-tp20668p20745.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

spark 1.1.1 Maven dependency

2014-12-09 Thread sivarani

Dear All,

I am using spark streaming, It was working fine when i was using spark1.0.2,
now i repeatedly getting few issue

Like class not found, i am using the same pom.xml with the updated version
for all spark modules
i am using  spark-core,streaming, streaming with kafka modules..

Its constantly keeps throwing errors for no commons-configuation,
commons-langs, logging 

How to get all the dependencies for running spark streaming.. Is there any
way or we just have to find by trial and error methord?

my pom dependencies

dependencies
dependency
groupIdjavax.servlet/groupId
artifactIdservlet-api/artifactId
version2.5/version
/dependency 
dependency 
  groupIdorg.apache.spark/groupId
  artifactIdspark-core_2.10/artifactId
  version1.0.2/version
/dependency

  dependency 
  groupIdorg.apache.spark/groupId
  artifactIdspark-streaming_2.10/artifactId
  version1.0.2/version
/dependency
  dependency 
  groupIdorg.apache.spark/groupId
  artifactIdspark-streaming-kafka_2.10/artifactId
  version1.0.2/version
/dependency

dependency
  groupIdorg.slf4j/groupId
  artifactIdslf4j-log4j12/artifactId
  version1.7.5/version
/dependency
dependency
groupIdcommons-logging/groupId
artifactIdcommons-logging/artifactId
version1.1.1/version
/dependency
dependency
groupIdcommons-configuration/groupId
artifactIdcommons-configuration/artifactId
version1.6/version
/dependency

Am i missing something here?






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-1-1-Maven-dependency-tp20590.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Submiting Spark application through code

2014-11-26 Thread sivarani

I am trying to submit spark streaming program, when i submit batch process
its working.. but when i do the same with spark streaming.. it throws Anyone
please help

14/11/26 17:42:25 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:50016
14/11/26 17:42:25 INFO server.Server: jetty-8.1.14.v20131031
14/11/26 17:42:25 INFO server.AbstractConnector: Started
SelectChannelConnector@0.0.0.0:4040
14/11/26 17:42:25 INFO ui.SparkUI: Started SparkUI at
http://172.18.152.36:4040
14/11/26 17:42:30 INFO spark.SparkContext: Added JAR
/Volumes/Official/workspace/ZBI/target/ZBI-0.0.1-SNAPSHOT-jar-with-dependencies.jar
at
http://172.18.152.36:50016/jars/ZBI-0.0.1-SNAPSHOT-jar-with-dependencies.jar
with timestamp 1417003949988
Exception in thread main java.lang.NoClassDefFoundError:
org/apache/spark/ui/SparkUITab
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at
org.apache.spark.streaming.StreamingContext.init(StreamingContext.scala:161)
at
org.apache.spark.streaming.StreamingContext.init(StreamingContext.scala:91)
at
org.apache.spark.streaming.api.java.JavaStreamingContext.init(JavaStreamingContext.scala:78)
at com.zoho.zbi.spark.SparkStreaming$1.create(SparkStreaming.java:51)
at
org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$9.apply(JavaStreamingContext.scala:564)
at
org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$9.apply(JavaStreamingContext.scala:563)
at scala.Option.getOrElse(Option.scala:120)
at
org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:545)
at
org.apache.spark.streaming.api.java.JavaStreamingContext$.getOrCreate(JavaStreamingContext.scala:563)
at
org.apache.spark.streaming.api.java.JavaStreamingContext.getOrCreate(JavaStreamingContext.scala)
at 
com.zoho.zbi.spark.SparkStreaming.callStreaming(SparkStreaming.java:98)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Submiting-Spark-application-through-code-tp17452p19934.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Streaming window operations not producing output

2014-11-05 Thread sivarani

hi TD,

I would like to run streaming 24/7 and trying to use get or create but its
not working please can you help on  this
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-getOrCreate-tc18060.html



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-window-operations-not-producing-output-tp17504p18169.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: java.io.NotSerializableException: org.apache.spark.SparkEnv

2014-11-05 Thread sivarani

Hi Thanks for replying, 

I have posted my code in
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-getOrCreate-tc18060.html



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-io-NotSerializableException-org-apache-spark-SparkEnv-tp10641p18172.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Submiting Spark application through code

2014-11-05 Thread sivarani

Thanks boss its working :)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Submiting-Spark-application-through-code-tp17452p18250.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark Streaming: foreachRDD network output

2014-11-05 Thread sivarani

Any one, any luck?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-foreachRDD-network-output-tp15205p18251.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: java.io.NotSerializableException: org.apache.spark.SparkEnv

2014-11-04 Thread sivarani

Same Issue .. How did you solve it?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-io-NotSerializableException-org-apache-spark-SparkEnv-tp10641p18047.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark Streaming getOrCreate

2014-11-04 Thread sivarani

Hi All

I am using SparkStreaming..

public class SparkStreaming{
SparkConf sparkConf = new SparkConf().setAppName(Sales);
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new
Duration(5000));
String chkPntDir = ; //get checkpoint dir
jssc.checkpoint(chkPntDir);
JavaSpark jSpark = new JavaSpark(); //this is where i have the business
logic
JavaStreamingContext newJSC = jSpark.callTest(jssc);
newJSC.start();
newJSC.awaitTermination();
}

where

public class JavaSpark implements Serializable{
public JavaStreamingContext callTest(JavaStreamingContext){
logic goes here
}
}

is working fine

But i try getOrCreate as i want spark streaming to run 24/7


JavaStreamingContextFactory contextFactory = new
JavaStreamingContextFactory() {
@Override
public JavaStreamingContext create() {
SparkConf sparkConf = new SparkConf().setAppName(appName).setMaster(master);
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new
Duration(5000));
jssc.checkpoint(checkpointDir);
JavaSpark js = new JavaSpark();
JavaStreamingContext newJssc = js.callTest(jssc);// This is where all the
logic is
return newJssc;
}

JavaStreamingContext context =
JavaStreamingContext.getOrCreate(checkPointDir, contextFactory);
context.start();
context.awaitTermination();

Not working

14/11/04 19:40:37 ERROR SparkDeploySchedulerBackend: Application has been
killed. Reason: All masters are unresponsive! Giving up.
Exception in thread Thread-37 org.apache.spark.SparkException: Job aborted
due to stage failure: All masters are unresponsive! Giving up.
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
14/11/04 19:40:37 ERROR JobScheduler: Error running job streaming job
141511018 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: All
masters are unresponsive! Giving up.
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)


Please help me out.

Earlier the biz logic was inside the ContextFactory but i got 

org.apache.spark.SparkException: Job aborted due to stage failure: Task not
serializable: java.io.NotSerializableException:
com.zoho.zbi.spark.PaymentStreaming$1

Then i added private static final long serialVersionUID =
-5751968749110204082L; in all the method dint work either

Got 

14/11/04 19:40:37 ERROR SparkDeploySchedulerBackend: Application has been
killed. Reason: All masters are unresponsive! Giving up.
Exception in thread Thread-37 org.apache.spark.SparkException: Job aborted
due to stage failure: All masters are unresponsive! Giving up.
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at

Re: Spark Streaming getOrCreate

2014-11-04 Thread sivarani

Anybody any luck? I am also trying to set NONE to delete key from state, will
null help? how to use scala none in java

My code goes this way 

public static class ScalaLang {

public  static T OptionT none() {
return (OptionT) None$.MODULE$;
}
}

 Function2Listlt;Double, OptionalDouble, OptionalDouble
updateFunction =
  new Function2Listlt;Double, 
OptionalDouble,
OptionalDouble() {
@Override public OptionalDouble 
call(ListDouble values,
OptionalDouble state) {
 Double newSum = state.or(0D); 
  if(values.isEmpty()){
  System.out.println(empty value);
  return null;  I WANT TO RETURN NONE 
TO DELETE KEY but when i
set ScalaLang.none(); it shows error
  }else{
  for (double i : values) {
 newSum += i;
  }
  return Optional.of(newSum);
  }
}
  };



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-getOrCreate-tp18060p18139.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Submiting Spark application through code

2014-10-31 Thread sivarani

I tried running it but dint work

public static final SparkConf batchConf= new SparkConf();
String master = spark://sivarani:7077;
String spark_home =/home/sivarani/spark-1.0.2-bin-hadoop2/;
String jar = /home/sivarani/build/Test.jar;
public static final JavaSparkContext batchSparkContext = new
JavaSparkContext(master,SparkTest,spark_home,new String[] {jar});

public static void main(String args[]){
runSpark(0,TestSubmit);}

public static void runSpark(int crit, String dataFile){
JavaRDDString logData = batchSparkContext.textFile(input, 10);
flatMap
 maptoparr
reduceByKey
ListTuple2lt;String, Integer output1 = counts.collect();
}


This works fine with spark-submit but when i tried to submit through code
LeadBatchProcessing.runSpark(0, TestSubmit.csv);

I get this following error 

HTTP Status 500 - javax.servlet.ServletException:
org.apache.spark.SparkException: Job aborted due to stage failure: Task
0.0:0 failed 4 times, most recent failure: TID 29 on host 172.18.152.36
failed for unknown reason
Job aborted due to stage failure: Task 0.0:0 failed 4 times, most recent
failure: TID 29 on host 172.18.152.36 failed for unknown reason Driver
stacktrace:



Any Advice on this?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Submiting-Spark-application-through-code-tp17452p17797.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark Streaming Issue not running 24/7

2014-10-30 Thread sivarani

The problem is simple

I want a to stream data 24/7 do some calculations and save the result in a
csv/json file so that i could use it for visualization using dc.js/d3.js

I opted for spark streaming on yarn cluster with kafka tried running it for
24/7

Using GroupByKey and updateStateByKey to have the computed historical data

Initially streaming is working fine.. but after few hours i am getting

14/10/30 23:48:49 ERROR TaskSetManager: Task 2485162.0:3 failed 4 times;
aborting job
14/10/30 23:48:50 ERROR JobScheduler: Error running job streaming job
141469227 ms.1
org.apache.spark.SparkException: Job aborted due to stage failure: Task
2485162.0:3 failed 4 times, most recent failure: Exception failure in TID
478548 on host 172.18.152.36: java.lang.ArrayIndexOutOfBoundsException

Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1049)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
I guess its due to the GroupByKey and updateStateByKey, i tried
GroupByKey(100) increased partition

Also when data is in state say for eg 10th sec 1000 records are in state,
100th sec 20,000 records are in state out of which 19,000 records are not
updated how to remove them from state.. UpdateStateByKey(none) how and when
to do that, how we will know when to send none, and save the data before
setting none?

I also tried not sending any data a few hours but check the web ui i am
getting task FINISHED

app-20141030203943- NewApp  0   6.0 GB  2014/10/30 20:39:43 hadoop  
FINISHED
4.2 h

This makes me confused.. In the code it says awaitTermination, but did not
terminate the task.. will streaming stop if no data is received for a
significant amount of time? Is there any doc available on how much time
spark will run when no data is streamed? Any Doc available



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Issue-not-running-24-7-tp17791.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Streaming Question regarding lazy calculations

2014-10-29 Thread sivarani

Hi All

I am using spark streaming with kafka streaming for 24/7 

My Code is something like

JavaDStreamString data = messages.map(new MapData());
JavaPairDStreamString, Iterablelt;String records = data.mapToPair(new
dataPair()).groupByKey(100);
records.print();
JavaPairDStreamString, Double result = records.mapValues(new
Sum()).updateStateByKey(updateFunction).cache();

result.foreach{
write(result,path); //writing result to the path 
}

Since result holds historcal value , even when there is no input record for
10 min , no change in result i tend to write it again and again for every 3
secs

i tried checking 

if(record.count()  0 )
{
result.foreach(write file)
}

But spark is not considering my check.. Any insight on how to achieve it



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-Question-regarding-lazy-calculations-tp17636.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Submiting Spark application through code

2014-10-28 Thread sivarani

Hi,

i am submitting spark application in the following fashion

bin/spark-submit --class NetworkCount --master spark://abc.test.com:7077 
try/simple-project/target/simple-project-1.0-jar-with-dependencies.jar

But is there any other way to submit spark application through the code? 

like for example i am checking for a condition if true i wanted to run the
spark application

(isConditionTrue){
   runSpark(NetworkCount,masterurl,jar)
}

I am aware we can set the jar and master url with spark context, but how to
run it from code automatically when a condition comes true without actually
using spark-submit

Is it possible?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Submiting-Spark-application-through-code-tp17452.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark Streaming Applications

2014-10-28 Thread sivarani

Hi tdas, is it possible to run spark 24/7, i am using updateStateByKey and i
am streaming 3lac records in 1/2 hr, i am not getting the correct result
also i am not not able to run spark streaming for 24/7 after hew hrs i get
array out of bound exception even if i am not streaming anything? btw will
the streaming end if i am not streaming anything for a few minutes? Please
help me out here

Also is it possible to delete state? since its growing exponentially, also
not all the data are updated.. at some point we have to reset it na? how to
do that.. i am able to work with batch processing using spark successfully
but streaming is quite a mystery for me

i am submitting spark application in the following fashion 

bin/spark-submit --class NetworkCount --master spark://abc.test.com:7077 
try/simple-project/target/simple-project-1.0-jar-with-dependencies.jar 

But is there any other way to submit spark application through the code? 

like for example i am checking for a condition if true i wanted to run the
spark application 

(isConditionTrue){ 
   runSpark(NetworkCount,masterurl,jar) 
} 

I am aware we can set the jar and master url with spark context, but how to
run it from code automatically when a condition comes true without actually
using spark-submit 

Is it possible?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Applications-tp16976p17453.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark Streaming - How to remove state for key

2014-10-28 Thread sivarani

I am having the same issue, i am using update stateBykey and over a period a
set of data will not change i would like save it and delete it from state..
have you found the answer? please share your views. Thanks for your time



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-How-to-remove-state-for-key-tp5534p17454.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Submiting Spark application through code

2014-10-28 Thread sivarani

Hi

I know we can create spark context with new JavaStreamingContext(master,
appName, batchDuration, sparkHome, jarFile)

but to run the application we will have to use 

spark-home/spark-submit --class NetworkCount

i want skip submitting manually, i wanted to invoke this spark app when a
condition is true from a while running another java application 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Submiting-Spark-application-through-code-tp17452p17461.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: checkpoint and not running out of disk space

2014-10-20 Thread sivarani

I am new to spark, i am using Spark streaming with Kafka..

My streaming duration is 1s..

Assume i get 100 records in 1s and 120 records in 2s and 80 records in 3s

-- {sec 1   1,2,...100} -- {sec 2 1,2..120} -- {sec 3 1,2,..80}
I apply my logic in sec 1 and have a result = result1

i want to use result1 in 2s and have a combined result of both result1 and
120 records of 2s as = result2

I tried to cache the result but i am not able to get the cached result1 in
2s is it possible.. or show some light on how to achieve my goal here?

 JavaPairReceiverInputDStreamString, String messages =  
KafkaUtils.createStream(jssc, String.class,String.class,
StringDecoder.class,StringDecoder.class, kafkaParams, topicMap,
StorageLevel.MEMORY_AND_DISK_SER_2());
i process messages and find word which is the result for 1s ...

if(resultCp!=null){
resultCp.print();
result = resultCp.union(words.mapValues(new Sum()));

}else{
result = words.mapValues(new Sum());
}

 resultCp =  result.cache();
when in 2s the resultCp should not be null but it returns null value so at
any given time i have that particular seconds data alone i want to find the
cumulative result. Do any one know how to do it..

I learnt that once spark streaming is started jssc.start() the control is no
more at our end it lies with spark.. so is it possible to send the result of
1s to 2s to find the accumulated value?

Any help is much much appreciated.. Thanks in advance



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/checkpoint-and-not-running-out-of-disk-space-tp1525p16790.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Standalone mode connection failure from worker node to master

Re: java.lang.IllegalStateException: unread block data

spark 1.1.1 Maven dependency

Re: Submiting Spark application through code

Re: Streaming window operations not producing output

Re: java.io.NotSerializableException: org.apache.spark.SparkEnv

Re: Submiting Spark application through code

Re: Spark Streaming: foreachRDD network output

Re: java.io.NotSerializableException: org.apache.spark.SparkEnv

Spark Streaming getOrCreate

Re: Spark Streaming getOrCreate

Re: Submiting Spark application through code

Spark Streaming Issue not running 24/7

Streaming Question regarding lazy calculations

Submiting Spark application through code

Re: Spark Streaming Applications

Re: Spark Streaming - How to remove state for key

Re: Submiting Spark application through code

Re: checkpoint and not running out of disk space

19 matches

Site Navigation

Mail list logo

Footer information