Hi All,
I connected pyspark under Zeppelin to my Elasticsearch DB and I am able to do
this:
%pyspark
es_rdd = sc.newAPIHadoopRDD(
inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf={ "es.resource" : "logstash-uni-*" })
es_rdd.toDF().registerTempTable("elk")
and then
%sql select * from elk
And then what I get is a table with just two columns. One is some objectID, I
guess and the other is a string with a mapping of all the fields in the ES
record into values ( " Map(@timestamp -> 2016-03-16T14:31:12.861Z, host -> ..."
).
My question is how do I create a spark table, or even just a python object (
probably a dict ), that will enable me to access each filed seperatly?
Thanks,
Oren