[xquery-talk] [ANN] Rumble 1.1 -- switched to DataFrames, and 2x faster

Ghislain Fourny Thu, 08 Aug 2019 05:51:36 -0700

Dear all,

I am happy to announce the release of Rumble 1.1 beta, the JSONiq engine that 
queries heterogeneous and nested JSON data on top of Apache Spark.


Until version 1.0, FLWOR expressions were mapped to Spark RDDs.

But in a student project last semester, Can managed to remap FLWOR expressions 
to DataFrames, while preserving intact support for heterogeneous data. The 
result is Rumble 1.1, with a notable performance improvement: twice as fast for 
grouping and sorting.

>From the user's perspective, nothing changes -- except the speed.

These are a few examples of use cases that show how the JSONiq syntax (95% 
inherited from XQuery) is as compact as SQL, but seamlessly deals with 
heterogeneity and nestedness:

1. How many persons in my dataset?

count(json-file("persons.json"))

2. What are all the cities they come from?

distinct-values(json-file("persons.json").addresses[].city)

3. How many persons in each country?

for $i json-file("persons.json")
group by $c := $i.country
return {
  "Country" : $c,
  "Number" : count($i)
}

If you want to try it out (no need for a cluster, it also spreads computation 
on your local cores), you can download it for free (open source) here:

http://rumbledb.org/

Thanks and kind regards,
Ghislain


_______________________________________________
[email protected]
http://x-query.com/mailman/listinfo/talk

[xquery-talk] [ANN] Rumble 1.1 -- switched to DataFrames, and 2x faster

Reply via email to