Dear all,
I am happy to announce the release of Rumble 1.1 beta, the JSONiq engine that
queries heterogeneous and nested JSON data on top of Apache Spark.
Until version 1.0, FLWOR expressions were mapped to Spark RDDs.
But in a student project last semester, Can managed to remap FLWOR expressions
to DataFrames, while preserving intact support for heterogeneous data. The
result is Rumble 1.1, with a notable performance improvement: twice as fast for
grouping and sorting.
>From the user's perspective, nothing changes -- except the speed.
These are a few examples of use cases that show how the JSONiq syntax (95%
inherited from XQuery) is as compact as SQL, but seamlessly deals with
heterogeneity and nestedness:
1. How many persons in my dataset?
count(json-file("persons.json"))
2. What are all the cities they come from?
distinct-values(json-file("persons.json").addresses[].city)
3. How many persons in each country?
for $i json-file("persons.json")
group by $c := $i.country
return {
"Country" : $c,
"Number" : count($i)
}
If you want to try it out (no need for a cluster, it also spreads computation
on your local cores), you can download it for free (open source) here:
http://rumbledb.org/
Thanks and kind regards,
Ghislain
_______________________________________________
[email protected]
http://x-query.com/mailman/listinfo/talk