Re: querying lots of quad files in block storage

2022-04-22 Thread ajs6f
Some potentially interesting projects:

https://sansa-stack.net/

https://github.com/rdfhdt/hdt-java

RDF HDT has a Jena integration component.

Adam

On Thu, Apr 14, 2022, 4:47 PM Martynas Jusevičius 
wrote:

> There was a related thread
> https://www.mail-archive.com/users@jena.apache.org/msg18577.html
>
> On Thu, 14 Apr 2022 at 22.42, Justin  wrote:
>
> > Hello,
> >
> > I am looking to see if Jena is a good fit for querying many billion quads
> > (in thousands of .nq files) sitting in block storage (like AWS S3). The
> .nq
> > files don't change. New .nq files do get added to S3, however. Also
> update
> > queries are not needed -- just selects, constructs, asks, etc.
> >
> > It would be easy to iterate over all the files and produce TDB2s in a
> > filesystem (on AWS EBS or EFS)...
> >
> > Has anyone gone down this path and have some wisdom to share?
> > I understand queries won't be as snappy as querying a single TDB2.
> >
> > Thanks,
> > Justin
> >
>


Re: querying lots of quad files in block storage

2022-04-20 Thread Andy Seaborne

Justin,

Are the query patterns spanning across the files?
If not, then

Another thought: filter the data in some way. keep the NQ files are the 
primary copy but if there are subsets of the data that make sense, run a 
process to extracts relevant part and build the database on that data.


Andy

On 14/04/2022 21:42, Justin wrote:

Hello,

I am looking to see if Jena is a good fit for querying many billion quads
(in thousands of .nq files) sitting in block storage (like AWS S3). The .nq
files don't change. New .nq files do get added to S3, however. Also update
queries are not needed -- just selects, constructs, asks, etc.

It would be easy to iterate over all the files and produce TDB2s in a
filesystem (on AWS EBS or EFS)...

Has anyone gone down this path and have some wisdom to share?
I understand queries won't be as snappy as querying a single TDB2.

Thanks,
Justin



Re: querying lots of quad files in block storage

2022-04-14 Thread Martynas Jusevičius
There was a related thread
https://www.mail-archive.com/users@jena.apache.org/msg18577.html

On Thu, 14 Apr 2022 at 22.42, Justin  wrote:

> Hello,
>
> I am looking to see if Jena is a good fit for querying many billion quads
> (in thousands of .nq files) sitting in block storage (like AWS S3). The .nq
> files don't change. New .nq files do get added to S3, however. Also update
> queries are not needed -- just selects, constructs, asks, etc.
>
> It would be easy to iterate over all the files and produce TDB2s in a
> filesystem (on AWS EBS or EFS)...
>
> Has anyone gone down this path and have some wisdom to share?
> I understand queries won't be as snappy as querying a single TDB2.
>
> Thanks,
> Justin
>


querying lots of quad files in block storage

2022-04-14 Thread Justin
Hello,

I am looking to see if Jena is a good fit for querying many billion quads
(in thousands of .nq files) sitting in block storage (like AWS S3). The .nq
files don't change. New .nq files do get added to S3, however. Also update
queries are not needed -- just selects, constructs, asks, etc.

It would be easy to iterate over all the files and produce TDB2s in a
filesystem (on AWS EBS or EFS)...

Has anyone gone down this path and have some wisdom to share?
I understand queries won't be as snappy as querying a single TDB2.

Thanks,
Justin