Re: fuseki in HA

2018-02-22 Thread DAVID MOLINA ESTRADA
Hi Andy,

In the next few days I will study your proposal and the diferents posibilities.


Thank you


  David Molina Estrada

  Software Architect

-Andy Seaborne <a...@apache.org> escribió: -
Para: users@jena.apache.org
De: Andy Seaborne <a...@apache.org>
Fecha: 22/02/2018 13:15
Asunto: [MASSMAIL]Re: fuseki in HA

Hi David,

This is one of the main use cases for:

https://afs.github.io/rdf-delta/

and there is a Fuseki-component in that build that incorporates the 
mechanism need for 2+ Fuseki's to propagate changes [3] (a custom 
service /dataset/patch that accepts patch files and applied them).

The work has two parts - the data format need to propagate change (RDF 
Patch [1]) and a patch log server [2].

Keeping these two components separate is import because not all 
situations will want patch server.  Distribution using  Hazelcast or 
Kafka, or publish changes in the style of Atom/RSS, being good examples. 
By having a defined patch format, there is no reason why the various 
triplestores even have to be all Jena-based.

Apache Licensed, not part of the Jena project.

Let me know what you think:

 Andy

[1] https://afs.github.io/rdf-delta/rdf-patch.html
[2] https://afs.github.io/rdf-delta/rdf-patch-logs.html
[3] https://github.com/afs/rdf-delta/tree/master/rdf-delta-fuseki

Disclosure : this part of my $job at TopQuadrant.

There is not reason not to start publishing it to maven central - I just 
haven't had the need to so far.

The RDF patch work is based on previous work with Rob Vesse.

On 21/02/18 12:32, DAVID MOLINA ESTRADA wrote:
> Hi,
> 
> I want to buid a HA web Application based on fuseki server in HA too. My idea 
> is create a fuseki docker and deploy so instance that I need. For querying 
> all is Ok, but I try to define a mechanism (it may be based in Topics with 
> Hazelcast or Kafka) to distribute changes to all nodes (Both uploading files 
> and SparQL updated).
> 
> Any recommandation or best practise? Has somebody done anything similar?
> 
> Thanks
> 
>   
> David Molina Estrada
> 
>
> Evite imprimir este mensaje si no es estrictamente necesario | Eviti imprimir 
> aquest missatge si no és estrictament necessari | Avoid printing this message 
> if it is not absolutely necessary
> 

Evite imprimir este mensaje si no es estrictamente necesario | Eviti imprimir 
aquest missatge si no és estrictament necessari | Avoid printing this message 
if it is not absolutely necessary



Re: fuseki in HA

2018-02-22 Thread Andy Seaborne

Hi David,

This is one of the main use cases for:

https://afs.github.io/rdf-delta/

and there is a Fuseki-component in that build that incorporates the 
mechanism need for 2+ Fuseki's to propagate changes [3] (a custom 
service /dataset/patch that accepts patch files and applied them).


The work has two parts - the data format need to propagate change (RDF 
Patch [1]) and a patch log server [2].


Keeping these two components separate is import because not all 
situations will want patch server.  Distribution using  Hazelcast or 
Kafka, or publish changes in the style of Atom/RSS, being good examples. 
By having a defined patch format, there is no reason why the various 
triplestores even have to be all Jena-based.


Apache Licensed, not part of the Jena project.

Let me know what you think:

Andy

[1] https://afs.github.io/rdf-delta/rdf-patch.html
[2] https://afs.github.io/rdf-delta/rdf-patch-logs.html
[3] https://github.com/afs/rdf-delta/tree/master/rdf-delta-fuseki

Disclosure : this part of my $job at TopQuadrant.

There is not reason not to start publishing it to maven central - I just 
haven't had the need to so far.


The RDF patch work is based on previous work with Rob Vesse.

On 21/02/18 12:32, DAVID MOLINA ESTRADA wrote:

Hi,

I want to buid a HA web Application based on fuseki server in HA too. My idea 
is create a fuseki docker and deploy so instance that I need. For querying all 
is Ok, but I try to define a mechanism (it may be based in Topics with 
Hazelcast or Kafka) to distribute changes to all nodes (Both uploading files 
and SparQL updated).

Any recommandation or best practise? Has somebody done anything similar?

Thanks

  
David Molina Estrada


   
Evite imprimir este mensaje si no es estrictamente necesario | Eviti imprimir aquest missatge si no és estrictament necessari | Avoid printing this message if it is not absolutely necessary




Re: Fuseki 2 HA or on-the-fly backups?

2015-08-24 Thread Jason Levitt
Great info, thanks.

 Some organisations achieve this by running a load balancer in front of
 several replicas then co-ordinating the update process.

So, they're running the same query against other nodes behind the load
balancer to keep things in sync?

 You can do a live backup

So, an HTTP POST /$/backup/*{name}*  initiates a backup and that
results in a gzip-compressed N-Quads file.

What does a restore look like from that file?

-J




On Mon, Aug 24, 2015 at 4:08 AM, Rob Vesse rve...@dotnetrdf.org wrote:
 Andy already answered 1 but more on 2

 Assuming you use TDB then in-memory checkpointing already happens.  TDB
 caches data into memory but fundamentally is a persistent disk backed
 database that uses write-ahead logging for transactions and failure
 recovery so this already happens automatically and is below the level of
 Fuseki (you get this behaviour wherever you use TDB provided you use it
 transactionally which Fuseki always does)

 Rob

 On 24/08/2015 05:51, Jason Levitt slimands...@gmail.com wrote:

Just wondering if there are any projects out there
to provide:

1) HA (high availability) configuration of Fuseki such
as mirroring or hot/standby failover.

2) Some kind of on-the-fly backup of Fuseki when it's
running in RAM. This might be similar to how Hadoop
1.x checkpoints the in-RAM namenode data structures.

BTW, are there any tools for testing the consistency of the Fuseki
data structures when Fuseki is temporarily halted?

Cheers,

Jason






Re: Fuseki 2 HA or on-the-fly backups?

2015-08-24 Thread Andy Seaborne

On 24/08/15 16:15, Jason Levitt wrote:

Great info, thanks.


Some organisations achieve this by running a load balancer in front of
several replicas then co-ordinating the update process.


So, they're running the same query against other nodes behind the load
balancer to keep things in sync?


You can do a live backup


So, an HTTP POST /$/backup/*{name}*  initiates a backup and that
results in a gzip-compressed N-Quads file.

What does a restore look like from that file?


You just load it into an empty database (tdbloader etc).

Andy



-J




On Mon, Aug 24, 2015 at 4:08 AM, Rob Vesse rve...@dotnetrdf.org wrote:

Andy already answered 1 but more on 2

Assuming you use TDB then in-memory checkpointing already happens.  TDB
caches data into memory but fundamentally is a persistent disk backed
database that uses write-ahead logging for transactions and failure
recovery so this already happens automatically and is below the level of
Fuseki (you get this behaviour wherever you use TDB provided you use it
transactionally which Fuseki always does)

Rob

On 24/08/2015 05:51, Jason Levitt slimands...@gmail.com wrote:


Just wondering if there are any projects out there
to provide:

1) HA (high availability) configuration of Fuseki such
as mirroring or hot/standby failover.

2) Some kind of on-the-fly backup of Fuseki when it's
running in RAM. This might be similar to how Hadoop
1.x checkpoints the in-RAM namenode data structures.

BTW, are there any tools for testing the consistency of the Fuseki
data structures when Fuseki is temporarily halted?

Cheers,

Jason