Ah, thanks Jerven. How do you deal with http-layer query timeouts? Are you able to predict them for certain common queries rather than waiting for the timeout to hit? S
On Fri, Jan 13, 2023 at 4:02 AM Jerven Tjalling Bolleman <[email protected]> wrote: > Hi All, > > Regarding these FAIR use settings. They are tuneable and maybe turned off, > so the specific > values that Openlink uses may or may not be used if wikidata would host > itself a virtuoso instance. > > e.g. for sparql.uniprot.org you are unlikely to run into these limits (as > the values are set very high indeed) > and are more likely to suffer from settings around the http layer that > limit query run time due to connection issues. > > Regards, > Jerven > > On 1/12/23 11:45 PM, Kingsley Idehen via Wikidata wrote: > > > On 1/12/23 3:39 AM, Larry Gonzalez wrote: > > Dear Kingsley, > > Let me start saying that I appreciate and thank the effort of loading > complete wikidata over a graph database and make and sparql endpoint > available. I know it is not an easy task to do > > I just tried out the new virtuoso-hosted sparql endpoint with some > queries. My experiments are not exhaustive at all, but I just wanted to > raise two concern that I detected > > Considering a (very simple) query that count all humans: > > ''' > SELECT (count(?human) as ?c) > WHERE > { > ?human wdt:P31 wd:Q5 . > } > ''' > > I get a result of 10396057, which is ok considering the dataset that you > are using > > But if we try to export all instances of human (on a tsv file) with the > following query: > > ''' > SELECT ?human > WHERE > { > ?human wdt:P31 wd:Q5 . > } > ''' > > Then I only get 100000 results. Is there a limit over the number of > results that a query can have? > > > > Yes, because these services are primarily for ad-hoc querying rather than > wholesale data exports. If you want to export massive amounts of data then > you can do so using OFFSET and LIMIT. > > Alternatively, you can instantiate your own instance in the Azure or AWS > cloud and use as you see fit. > > Like what we provide regarding DBpedia, there's a server side > configuration in place for enforcing a "fair use" policy :) > > > > > Furthermore, if we want to get all humans ordered by id, then the endpoint > times out. The following is the query: > > ''' > SELECT ?human > WHERE > { > ?human wdt:P31 wd:Q5 . > } > ORDER BY DESC(?human) > ''' > > > > If you set the query timeout to a value over 1000 msecs, the Virtuoso > Anytime Query feature will provide you with a partial solution which you > can use in conjunction with OFFSET and LIMIT to creative an interactive > cursor (or scrollable cursor). Beyond that, its back to the "fair use" > policy and option to instantiate your own service-specific instance using > our cloud offerings. > > > Regards, > > Kingsley > > > > Thank you again for all your efforts. I am looking forward to see how this > new endpoint work, :) > > Are you planning to update regularly the dataset? > > All the best! > Larry > > https://iccl.inf.tu-dresden.de/web/Larry_Gonzalez > > > > On 11.01.23 21:51, Kingsley Idehen via Wikidata wrote: > > All, > > We are pleased to announce immediate availability of an new > Virtuoso-hosted Wikidata instance based on the most recent datasets. This > instance comprises 17 billion+ RDF triples. > > Host Machine Info: > > Item Value > > CPU > > > > |2x Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz| > > Cores > > > > |24| > > Memory > > > > |378 GB| > > SSD > > > > |4x Crucial M4 SSD 500 GB| > > > Cloud related costs for a self-hosted variant, assuming: > > * > > dedicated machine for 1 year without upfront costs > > * > > 128 GiB memory > > * > > 16 cores or more > > * > > 512GB SSD for the database > > * > > 3T outgoing internet traffic (based on our DBpedia statistics) > > > vendor machine type memory vCPUs monthly machine > monthly disk monthly network monthly total > > Amazon > > > > r5a.4xlarge > > > > 128 GiB > > > > 16 > > > > $479.61 > > > > $55.96 > > > > $276.48 > > > > $812.05 > > Google > > > > e2highmem-16 > > > > 128 GiB > > > > 16 > > > > $594.55 > > > > $95.74 > > > > $255.00 > > > > $945.30 > > Azure > > > > D32a > > > > 128 GiB > > > > 32 > > > > $769.16 > > > > $38.40 > > > > $252.30 > > > > $1,060.06 > > > SPARQL Query and Full Text Search service endpoints: > > * > > https://wikidata.demo.openlinksw.com/sparql -- SPARQL Query Services > Endpoint > > * > > https://wikidata.demo.openlinksw.com/fct -- Faceted Search & Browsing > > > Additional Information > > * > > Loading the Wikidata dataset 2022/12 into Virtuoso Open Source - > Announcements - OpenLink Software Community (openlinksw.com) > > <https://community.openlinksw.com/t/loading-the-wikidata-dataset-2022-12-into-virtuoso-open-source/3580> > <https://community.openlinksw.com/t/loading-the-wikidata-dataset-2022-12-into-virtuoso-open-source/3580> > > > Happy New Year! > > -- > Regards, > > Kingsley Idehen > Founder & CEO > OpenLink Software > Home Page:http://www.openlinksw.com > Community Support:https://community.openlinksw.com > Weblogs (Blogs): > Company Blog:https://medium.com/openlink-software-blog > Virtuoso Blog:https://medium.com/virtuoso-blog > Data Access Drivers Blog: > https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers > > Personal Weblogs (Blogs): > Medium Blog:https://medium.com/@kidehen > Legacy Blogs:http://www.openlinksw.com/blog/~kidehen/ > http://kidehen.blogspot.com > > Profile Pages: > Pinterest:https://www.pinterest.com/kidehen/ > Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen > Twitter:https://twitter.com/kidehen > Google+:https://plus.google.com/+KingsleyIdehen/about > LinkedIn:http://www.linkedin.com/in/kidehen > > Web Identities (WebID): > Personal:http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i > : > http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this > > > _______________________________________________ > Wikidata mailing list -- [email protected] > Public archives at > https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/TI7U5Q6ZBEEPCNSTZ2KYLEXEDO4E4GMG/ > To unsubscribe send an email to [email protected] > > _______________________________________________ > Wikidata mailing list -- [email protected] > Public archives at > https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/I5BQ5ORIAENKE5RTJWM4JSAL52JXWP3F/ > To unsubscribe send an email to [email protected] > > > > > _______________________________________________ > Wikidata mailing list -- [email protected] > Public archives at > https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/SPZNSYFEVRK6NYA5YO7ORZDA4EHSP37R/ > To unsubscribe send an email to [email protected] > > > _______________________________________________ > Wikidata mailing list -- [email protected] > Public archives at > https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/IM4IODBJ3FGR3QT2AIATCJTXRHM4E2AX/ > To unsubscribe send an email to [email protected] > -- Samuel Klein @metasj w:user:sj +1 617 529 4266
_______________________________________________ Wikidata mailing list -- [email protected] Public archives at https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/26CGOE5LZESW5Q5ADJP4Z3ZDX6MU6SBT/ To unsubscribe send an email to [email protected]
