I'm using ElasticSearch with elasticsearch-spark-BUILD-SNAPSHOT and Spark/SparkSQL 1.2.0, from Costin Leau's advice.
I want to query ElasticSearch for a bunch of JSON documents from within SparkSQL, and then use a SQL query to simply query for a column, which is actually a JSON key -- normal things that SparkSQL does using the SQLContext.jsonFile(filePath) facility. The difference I am using the ElasticSearch container. The big problem: when I do something like SELECT jsonKeyA from tempTable; I actually get the WRONG KEY out of the JSON documents! I discovered that if I have JSON keys physically in the order D, C, B, A in the json documents, the elastic search connector discovers those keys BUT then sorts them alphabetically as A,B,C,D - so when I SELECT A from tempTable, I actually get column D (because the physical JSONs had key D in the first position). This only happens when reading from elasticsearch and SparkSQL. It gets much worse: When a key is missing from one of the documents and that key should be NULL, the whole application actually crashes and gives me a java.lang.IndexOutOfBoundsException -- the schema that is inferred is totally screwed up. In the above example with physical JSONs containing keys in the order D,C,B,A, if one of the JSON documents is missing the key/column I am querying for, I get that java.lang.IndexOutOfBoundsException exception. I am using the BUILD-SNAPSHOT because otherwise I couldn't build the elasticsearch-spark project, Costin said so. Any clues here? Any fixes?