SparkStreaming variable scope

jpinela Wed, 09 Dec 2015 10:24:56 -0800

Hi Guys,
I am sure this is a simple question, but I can't find it in the docs
anywhere.
This reads from flume and writes to hbase (as you can see).
But has a variable scope problem (I believe).
I have the following code:


*
from pyspark.streaming import StreamingContext
from pyspark.streaming.flume import FlumeUtils
from datetime import datetime
ssc = StreamingContext(sc, 5)
conf = {"hbase.zookeeper.quorum": "ubuntu3",
            "hbase.mapred.outputtable": "teste2",
            "mapreduce.outputformat.class":
"org.apache.hadoop.hbase.mapreduce.TableOutputFormat",
            "mapreduce.job.output.key.class":
"org.apache.hadoop.hbase.io.ImmutableBytesWritable",
            "mapreduce.job.output.value.class":
"org.apache.hadoop.io.Writable"}


keyConv =
"org.apache.spark.examples.pythonconverters.StringToImmutableBytesWritableConverter"
valueConv =
"org.apache.spark.examples.pythonconverters.StringListToPutConverter"

lines = FlumeUtils.createStream(ssc, 'ubuntu3', 9997)
words = lines.map(lambda line: line[1])
rowid = datetime.now().strftime("%Y%m%d%H%M%S")
outrdd= words.map(lambda x: (str(1),[rowid,"cf1desc","col1",x]))
print("ok 1")
outrdd.pprint()

outrdd.foreachRDD(lambda x:
x.saveAsNewAPIHadoopDataset(conf=conf,keyConverter=keyConv,valueConverter=valueConv))

ssc.start()
ssc.awaitTermination()*

the issue is that the rowid variable is allways at the point that the
streaming was began.
How can I go around this? I tried a function, an application, nothing
worked.
Thank you.
jp



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkStreaming-variable-scope-tp25652.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

SparkStreaming variable scope

Reply via email to