Re: querying lots of quad files in block storage
Some potentially interesting projects: https://sansa-stack.net/ https://github.com/rdfhdt/hdt-java RDF HDT has a Jena integration component. Adam On Thu, Apr 14, 2022, 4:47 PM Martynas Jusevičius wrote: > There was a related thread > https://www.mail-archive.com/users@jena.apache.org/msg18577.html > > On Thu, 14 Apr 2022 at 22.42, Justin wrote: > > > Hello, > > > > I am looking to see if Jena is a good fit for querying many billion quads > > (in thousands of .nq files) sitting in block storage (like AWS S3). The > .nq > > files don't change. New .nq files do get added to S3, however. Also > update > > queries are not needed -- just selects, constructs, asks, etc. > > > > It would be easy to iterate over all the files and produce TDB2s in a > > filesystem (on AWS EBS or EFS)... > > > > Has anyone gone down this path and have some wisdom to share? > > I understand queries won't be as snappy as querying a single TDB2. > > > > Thanks, > > Justin > > >
Re: querying lots of quad files in block storage
Justin, Are the query patterns spanning across the files? If not, then Another thought: filter the data in some way. keep the NQ files are the primary copy but if there are subsets of the data that make sense, run a process to extracts relevant part and build the database on that data. Andy On 14/04/2022 21:42, Justin wrote: Hello, I am looking to see if Jena is a good fit for querying many billion quads (in thousands of .nq files) sitting in block storage (like AWS S3). The .nq files don't change. New .nq files do get added to S3, however. Also update queries are not needed -- just selects, constructs, asks, etc. It would be easy to iterate over all the files and produce TDB2s in a filesystem (on AWS EBS or EFS)... Has anyone gone down this path and have some wisdom to share? I understand queries won't be as snappy as querying a single TDB2. Thanks, Justin
Re: querying lots of quad files in block storage
There was a related thread https://www.mail-archive.com/users@jena.apache.org/msg18577.html On Thu, 14 Apr 2022 at 22.42, Justin wrote: > Hello, > > I am looking to see if Jena is a good fit for querying many billion quads > (in thousands of .nq files) sitting in block storage (like AWS S3). The .nq > files don't change. New .nq files do get added to S3, however. Also update > queries are not needed -- just selects, constructs, asks, etc. > > It would be easy to iterate over all the files and produce TDB2s in a > filesystem (on AWS EBS or EFS)... > > Has anyone gone down this path and have some wisdom to share? > I understand queries won't be as snappy as querying a single TDB2. > > Thanks, > Justin >
querying lots of quad files in block storage
Hello, I am looking to see if Jena is a good fit for querying many billion quads (in thousands of .nq files) sitting in block storage (like AWS S3). The .nq files don't change. New .nq files do get added to S3, however. Also update queries are not needed -- just selects, constructs, asks, etc. It would be easy to iterate over all the files and produce TDB2s in a filesystem (on AWS EBS or EFS)... Has anyone gone down this path and have some wisdom to share? I understand queries won't be as snappy as querying a single TDB2. Thanks, Justin