PySpark elasticsearch question

Mohamed Lrhazi Tue, 09 Dec 2014 05:18:09 -0800

Hello,

Following a couple of tutorials, I cant seem to get pysprak to get any
"fields" from ES other than the document id?


I tried like so:

es_rdd =
sc.newAPIHadoopRDD(inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",keyClass="org.apache.hadoop.io.NullWritable",valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",conf={
"es.resource" : "en_2004/doc","es.nodes":"rap-es2.uis","es.query" :
"?fields=title,_source" })

es_rdd.take(1)

Always shows:

Out[13]: [(u'en_20040726_fbis_116728340038', {})]

How does one get more fields?


Thanks,
Mohamed.

PySpark elasticsearch question

Reply via email to