I have 14T SSD (RAID 0) Le lun. 13 juil. 2020 à 21:19, Amirouche Boubekki <[email protected]> a écrit : > > Le lun. 13 juil. 2020 à 19:42, Adam Sanchez <[email protected]> a écrit : > > > > Hi, > > > > I have to launch 2 million queries against a Wikidata instance. > > I have loaded Wikidata in Virtuoso 7 (512 RAM, 32 cores, SSD disks with > > RAID 0). > > The queries are simple, just 2 types. > > How much SSD in Gigabytes do you have? > > > select ?s ?p ?o { > > ?s ?p ?o. > > filter (?s = ?param) > > } > > Is that the same as: > > select ?p ?o { > param ?p ?o > } > > Where param is one of the two million params. > > > select ?s ?p ?o { > > ?s ?p ?o. > > filter (?o = ?param) > > } > > > > If I use a Java ThreadPoolExecutor takes 6 hours. > > How can I speed up the queries processing even more? > > > > I was thinking : > > > > a) to implement a Virtuoso cluster to distribute the queries or > > b) to load Wikidata in a Spark dataframe (since Sansa framework is > > very slow, I would use my own implementation) or > > c) to load Wikidata in a Postgresql table and use Presto to distribute > > the queries or > > d) to load Wikidata in a PG-Strom table to use GPU parallelism. > > > > What do you think? I am looking for ideas. > > Any suggestion will be appreciated. > > > > Best, > > > > _______________________________________________ > > Wikidata mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/wikidata > > > > -- > Amirouche ~ https://hyper.dev > > _______________________________________________ > Wikidata mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikidata
_______________________________________________ Wikidata mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata
