Hello,

I have the problem that I need to split up my index because it can get very 
huge. I'm not sure which way to take to do the split up, therefore I'm looking 
for some advice. Right now the index fits on one machine, which means there is 
no need yet to build a distributed system. Right now I mainly care for 
improving the search performance if the index has gigabytes of size, therefore 
I think it is enough to just have multiple datasets and some mechanism that 
figures out which data needs to be put into which dataset.

There is one key goal which needs to be fulfilled: SPARQL queries that are sent 
to the server must not know anything about the fact that the system can be 
decentralized or even distributed. This means that the user writes a SPARQL 
query assuming that all the data is inside one big index and it is the job of 
the indexer to figure out how to process the query correctly.

Fulfilling this goal is possible because the ontologies that are used by my 
system follow a graph structure or in other words there are relations between 
classes and properties in the system. The part that I haven't figured out yet 
is at which point the system can look at the ontologies and based on this 
information where it can read the correct datasets.

I thought that it would make most sense to implement this mechanism inside of 
the SPARQL evaluation engine of Jena. This evaluation engine already needs to 
figure out what a SPARQL query means and adding more logic to it seems the way 
to go. My question: Is it possible to access/alter the behavior of the SPARQL 
evaluation engine and if yes how? Can someone think of another way on how to 
solve my problem without going very deeply into Jena itself?

Simon

Reply via email to