Re: jena-fuseki UI in podman execution (2nd effort without attachments)

2024-02-11 Thread jaanam
Hi Andy, it really seems that there had been an UI update as the olden versions 
look correct. The problem with the latest jena-fuseki UI is that I have some 
hundred named graphs and I cannot access them through the latest UI.

What is Vue ?

Br, Jaana 

> 09.02.2024 13.37 EET Andy Seaborne  kirjoitti:
> 
>  
> Hi Jaana,
> 
> Glad you got it sorted out.
> 
> The Fuseki UI does not do anything special about browser caches. There 
> was a major UI update with implementing it in Vue and all the HTML 
> assets that go with that.
> 
>  Andy
> 
> On 09/02/2024 05:37, jaa...@kolumbus.fi wrote:
> > Hi, I just noticed that it's not  question about podman or docker but about 
> > browser cache. After deleting everything in browser cache I managed to get 
> > the correct user interface when running stain/jena-fuseki:3.14.0 and 
> > stain/jena-fuseki:4.0.0 by both podman and docker, but when I tried the 
> > latest stain/jena-fuseki (4.8.0) I got the incorrect interface (shown here 
> > https://github.com/jamietti/jena/blob/main/fuseki-podman.png).
> > 
> > Jaana M
> > 
> > 
> >> 08.02.2024 13.23 EET jaa...@kolumbus.fi kirjoitti:
> >>
> >>   
> >> Hi, I've running jena-fuseki with docker:
> >>   
> >> docker run -p 3030:3030 -e ADMIN_PASSWORD=pw123 stain/jena-fuseki
> >>   
> >> and rootless podman:
> >>   
> >> podman run -p 3030:3030 -e ADMIN_PASSWORD=pw123 docker.io/stain/jena-fuseki
> >>   
> >> when excuted the same version 4.8.0 of jena-fuseki with podman the UI 
> >> looks totally different from the UI of the instance excuted with docker.
> >>   
> >> see file fuseki-podman.png 
> >> https://github.com/jamietti/jena/blob/main/fuseki-podman.png in 
> >> https://github.com/jamietti/jena/
> >> What can cause this problem ?
> >>   
> >> Br, Jaana M


Re: jena-fuseki UI in podman execution (2nd effort without attachments)

2024-02-11 Thread jaanam
Hi, Thanks for your suggestion refgarding secoresearch/fuseki ! I must try it !

Br, Jaana
 
> 10.02.2024 17.20 EET Andrii Berezovskyi  kirjoitti:
> 
>  
> A bit unrelated, but I could also recommend secoresearch/fuseki image, which 
> is maintained by Jouni Tuominen and is currently at Jena 4.10.0 & JDK 21.0.2.
> 
> –Andrew.
> 
> On 9 Feb 2024, at 12:37, Andy Seaborne  wrote:
> 
> Hi Jaana,
> 
> Glad you got it sorted out.
> 
> The Fuseki UI does not do anything special about browser caches. There was a 
> major UI update with implementing it in Vue and all the HTML assets that go 
> with that.
> 
>Andy
> 
> On 09/02/2024 05:37, jaa...@kolumbus.fi wrote:
> Hi, I just noticed that it's not  question about podman or docker but about 
> browser cache. After deleting everything in browser cache I managed to get 
> the correct user interface when running stain/jena-fuseki:3.14.0 and 
> stain/jena-fuseki:4.0.0 by both podman and docker, but when I tried the 
> latest stain/jena-fuseki (4.8.0) I got the incorrect interface (shown here 
> https://github.com/jamietti/jena/blob/main/fuseki-podman.png).
> Jaana M
> 08.02.2024 13.23 EET jaa...@kolumbus.fi kirjoitti:
> 
>  Hi, I've running jena-fuseki with docker:
>  docker run -p 3030:3030 -e ADMIN_PASSWORD=pw123 stain/jena-fuseki
>  and rootless podman:
>  podman run -p 3030:3030 -e ADMIN_PASSWORD=pw123 docker.io/stain/jena-fuseki
>  when excuted the same version 4.8.0 of jena-fuseki with podman the UI looks 
> totally different from the UI of the instance excuted with docker.
>  see file fuseki-podman.png 
> https://github.com/jamietti/jena/blob/main/fuseki-podman.png in 
> https://github.com/jamietti/jena/
> What can cause this problem ?
>  Br, Jaana M


Re: jena-fuseki UI in podman execution

2024-02-11 Thread jaanam
Hi, Thanks for yopur suggestion: "I would build the docker image from source 
and run that under both for a clean test."
I must try it,

Jaana

> 12.02.2024 00.32 EET Justin  kirjoitti:
> 
>  
> I would build the docker image from source and run that under both for a
> clean test.
> 
> On Thu, Feb 8, 2024, 4:27 AM Rob @ DNR  wrote:
> 
> > Hi
> >
> > This list does not permit attachments so we can’t see your screenshots,
> > can you upload them to some public image hosting somewhere and link to them?
> >
> > Thanks,
> >
> > Rob
> >
> > From: jaa...@kolumbus.fi 
> > Date: Thursday, 8 February 2024 at 08:48
> > To: users@jena.apache.org 
> > Subject: jena-fuseki UI in podman execution
> > Hi, I've running jena-fuseki with docker:
> >
> > docker run -p 3030:3030 -e ADMIN_PASSWORD=pw123 stain/jena-fuseki
> >
> > and rootless podman:
> >
> > podman run -p 3030:3030 -e ADMIN_PASSWORD=pw123
> > docker.io/stain/jena-fuseki
> >
> > when excuted the same version 4.8.0 of jena-fuseki with podman the UI
> > looks totally different from the UI of the instance excuted with docker.
> >
> > See attachement for the UI of podman excution.
> >
> > What can cause this problem ?
> >
> > Br, Jaana M
> >
> >
> >
> >
> >


Re: jena-fuseki UI in podman execution (2nd effort without attachments)

2024-02-08 Thread jaanam
Hi, I just noticed that it's not  question about podman or docker but about 
browser cache. After deleting everything in browser cache I managed to get the 
correct user interface when running stain/jena-fuseki:3.14.0 and 
stain/jena-fuseki:4.0.0 by both podman and docker, but when I tried the latest 
stain/jena-fuseki (4.8.0) I got the incorrect interface (shown here 
https://github.com/jamietti/jena/blob/main/fuseki-podman.png).

Jaana M


> 08.02.2024 13.23 EET jaa...@kolumbus.fi kirjoitti:
> 
>  
> Hi, I've running jena-fuseki with docker:
>  
> docker run -p 3030:3030 -e ADMIN_PASSWORD=pw123 stain/jena-fuseki
>  
> and rootless podman:
>  
> podman run -p 3030:3030 -e ADMIN_PASSWORD=pw123 docker.io/stain/jena-fuseki
>  
> when excuted the same version 4.8.0 of jena-fuseki with podman the UI looks 
> totally different from the UI of the instance excuted with docker.
>  
> see file fuseki-podman.png 
> https://github.com/jamietti/jena/blob/main/fuseki-podman.png in 
> https://github.com/jamietti/jena/
> What can cause this problem ?
>  
> Br, Jaana M


jena-fuseki UI in podman execution (2nd effort without attachments)

2024-02-08 Thread jaanam
Hi, I've running jena-fuseki with docker:
 
docker run -p 3030:3030 -e ADMIN_PASSWORD=pw123 stain/jena-fuseki
 
and rootless podman:
 
podman run -p 3030:3030 -e ADMIN_PASSWORD=pw123 docker.io/stain/jena-fuseki
 
when excuted the same version 4.8.0 of jena-fuseki with podman the UI looks 
totally different from the UI of the instance excuted with docker.
 
see file fuseki-podman.png 
https://github.com/jamietti/jena/blob/main/fuseki-podman.png in 
https://github.com/jamietti/jena/
What can cause this problem ?
 
Br, Jaana M
 
 
 

jena-fuseki UI in podman execution

2024-02-08 Thread jaanam
Hi, I've running jena-fuseki with docker:
 
docker run -p 3030:3030 -e ADMIN_PASSWORD=pw123 stain/jena-fuseki
 
and rootless podman:
 
podman run -p 3030:3030 -e ADMIN_PASSWORD=pw123 docker.io/stain/jena-fuseki
 
when excuted the same version 4.8.0 of jena-fuseki with podman the UI looks 
totally different from the UI of the instance excuted with docker.
 
See attachement for the UI of podman excution.
 
What can cause this problem ?
 
Br, Jaana M
 
 
 
 

Re: Combining sparql queries to speed up the calling process ?

2023-05-09 Thread jaanam

Hello,

thanks for encouraging me, just got it working as you said,

Jaana

Andy Seaborne kirjoitti 6.5.2023 18:55:

On 05/05/2023 11:50, jaa...@kolumbus.fi wrote:
Thanks for your answer, but I still don't undestand how to combine 
those two queries.


I'm not saying they can simply be joined together.

If the FROM tilasto:$STAT is avoided, and the other FROM can be
removed and GRAPH used then there is probably a single query. That
depends on the data and data model.

Andy



If I put them like this jena-fuseki-UI doesn't accept line  "{ 
graph tilasto:?ng", because ?ng comes from the 1st subquery.


It may take a LATERAL join but

BIND ( make IRI with tilasto: and ?ng AS ?gn )
GRAPH ?gn

is a possible way to modify the query.

It does all depend on the data and data model which you are the expert 
for.





SELECT *

WHERE {

   {

   {

    GRAPH stat:outputchannel

     {

    ?subject outchl:refersToTable ?t_id .

     ?subject outchl:refersToNamedGraph ?ng .

     ?subject outchl:hasOutputFileId ?o_id.

     filter (regex(?o_id,"of_001"))

     }

     }

    }



   { graph tilasto:?ng

     {

     ?pxfile pxt:tableId "?t_id".

  ?pxfile pxt:isPresentationOf ?cube.

  ?cube dc:description ?title_fi.

  ?cube cubemeta:refersToOutputFile ?of.

   }

     }

     { graph stat:outputchannel

     {

     ?subject outchl:hasOutputFileId ?of.

  ?subject outchl:refersToOutputChannel ?channel_prefix .

     ?subject outchl:refersToTable ?t_id .

   ?subject outchl:refersToNamedGraph ?ng .

     ?subject outchl:hasOutputFileId ?o_id.

    }

   }

     { graph stat:tabAdmin

    {

     ?kanta outchl:hasOutputChannelId ?channel_prefix .

     ?kanta outchl:directoryPathRoot ?directoryPathRoot .

    }

   }

}

Can you help ?

Br, Jaana

Andy Seaborne kirjoitti 4.5.2023 13:14:

On 04/05/2023 07:18, jaa...@kolumbus.fi wrote:

Andy Seaborne kirjoitti 4.5.2023 00:24:

Do you have to use FROM in the second query?


I don't know how to present it because in the 2nd query I'm querying 
three named graphs, where the third one ($STAT) should be replaced 
with the result of the 1st query (?ng)




But do you need a merged graph or can you use GRAPH? FROM of multiple
graphs may be a significant cost.  (It precludes TDB executing more
directly on basic graph patterns.)

With GRAPH ?g you can apply a condition to the ?g.
And that means you can combine the queries which might help - to 
know,

needs an experiment.

---

3000 queries in 5 mins is 100ms a query, including client and server 
overheads.


Are you doing the 3000 queries in parallel? A bit of parallelism 
might

save elapsed time (start with parallel = 2).

    Andy


Br, Jaana



On 03/05/2023 17:58, jaa...@kolumbus.fi wrote:

Hello,

I have the two queries below which I run from my code so that the 
1st query returns about 3000  ?ng and ?t_id pairs which will then 
be used in the second query in the place of $STAT and $RDF_ID. So 
I'm calling the second query in a loop about 3000 times.


I've noticed that it is time consuming; it takes about 4-5 
minutes. How could I combine the queries so that I could get all 
the information by just one call ?


$PREFIXS
 SELECT *
 WHERE
   {
   {
    GRAPH stat:outputchannel
 {
 ?subject outchl:refersToTable ?t_id .
    ?subject outchl:refersToNamedGraph ?ng .
   ?subject outchl:hasOutputFileId ?o_id.
 filter (regex(?o_id,"of_001"))
 }
 }
    }


and

$PREFIXS

    SELECT *
    FROM stat:outputchannel
    FROM stat:tabAdmin
    FROM tilasto:$STAT
    WHERE {
  ?pxfile pxt:tableId "$RDF_ID".
    ?pxfile pxt:hasStatus ?status.
  ?pxfile pxt:hasFrequency ?frequency.
  ?pxfile pxt:isPresentationOf ?cube.
  ?cube dc:description ?title_fi.
  ?cube dc:language ?language.
  ?cube cubemeta:hasStatisticalProgramme 
?statisticalProgram.

  ?cube cubemeta:lastUpdated ?lastUpdated.
  ?cube cubemeta:refersToOutputFile ?of.
  ?subject outchl:hasOutputFileId ?of.
  ?subject outchl:refersToOutputChannel ?channel_prefix .
  ?kanta outchl:hasOutputChannelId ?channel_prefix .
  ?kanta outchl:directoryPathRoot ?directoryPathRoot .
  ?kanta a outchl_ont:OutputChannel .
    filter(?language ="fi"^^xsd:language)
  filter (lang(?title_fi) = "fi")
  filter ( langMatches(lang(?directoryPathRoot),"fi") )
}

Br jaana M


Re: Combining sparql queries to speed up the calling process ?

2023-05-05 Thread jaanam
Thanks for your answer, but I still don't undestand how to combine those 
two queries.


If I put them like this jena-fuseki-UI doesn't accept line  "{ graph 
tilasto:?ng", because ?ng comes from the 1st subquery.



SELECT *

WHERE {

  {

  {

   GRAPH stat:outputchannel

{

   ?subject outchl:refersToTable ?t_id .

?subject outchl:refersToNamedGraph ?ng .

?subject outchl:hasOutputFileId ?o_id.

filter (regex(?o_id,"of_001"))

}

}

   }



  { graph tilasto:?ng

{

?pxfile pxt:tableId "?t_id".

 ?pxfile pxt:isPresentationOf ?cube.

 ?cube dc:description ?title_fi.

 ?cube cubemeta:refersToOutputFile ?of.

  }

}

{ graph stat:outputchannel

{

?subject outchl:hasOutputFileId ?of.

 ?subject outchl:refersToOutputChannel ?channel_prefix .

?subject outchl:refersToTable ?t_id .

  ?subject outchl:refersToNamedGraph ?ng .

?subject outchl:hasOutputFileId ?o_id.

   }

  }

{ graph stat:tabAdmin

   {

?kanta outchl:hasOutputChannelId ?channel_prefix .

?kanta outchl:directoryPathRoot ?directoryPathRoot .

   }

  }

}

Can you help ?

Br, Jaana

Andy Seaborne kirjoitti 4.5.2023 13:14:

On 04/05/2023 07:18, jaa...@kolumbus.fi wrote:

Andy Seaborne kirjoitti 4.5.2023 00:24:

Do you have to use FROM in the second query?


I don't know how to present it because in the 2nd query I'm querying 
three named graphs, where the third one ($STAT) should be replaced 
with the result of the 1st query (?ng)




But do you need a merged graph or can you use GRAPH? FROM of multiple
graphs may be a significant cost.  (It precludes TDB executing more
directly on basic graph patterns.)

With GRAPH ?g you can apply a condition to the ?g.
And that means you can combine the queries which might help - to know,
needs an experiment.

---

3000 queries in 5 mins is 100ms a query, including client and server 
overheads.


Are you doing the 3000 queries in parallel? A bit of parallelism might
save elapsed time (start with parallel = 2).

Andy


Br, Jaana



On 03/05/2023 17:58, jaa...@kolumbus.fi wrote:

Hello,

I have the two queries below which I run from my code so that the 
1st query returns about 3000  ?ng and ?t_id pairs which will then be 
used in the second query in the place of $STAT and $RDF_ID. So I'm 
calling the second query in a loop about 3000 times.


I've noticed that it is time consuming; it takes about 4-5 minutes. 
How could I combine the queries so that I could get all the 
information by just one call ?


$PREFIXS
 SELECT *
 WHERE
   {
   {
    GRAPH stat:outputchannel
 {
 ?subject outchl:refersToTable ?t_id .
    ?subject outchl:refersToNamedGraph ?ng .
   ?subject outchl:hasOutputFileId ?o_id.
 filter (regex(?o_id,"of_001"))
 }
 }
    }


and

$PREFIXS

    SELECT *
    FROM stat:outputchannel
    FROM stat:tabAdmin
    FROM tilasto:$STAT
    WHERE {
  ?pxfile pxt:tableId "$RDF_ID".
    ?pxfile pxt:hasStatus ?status.
  ?pxfile pxt:hasFrequency ?frequency.
  ?pxfile pxt:isPresentationOf ?cube.
  ?cube dc:description ?title_fi.
  ?cube dc:language ?language.
  ?cube cubemeta:hasStatisticalProgramme 
?statisticalProgram.

  ?cube cubemeta:lastUpdated ?lastUpdated.
  ?cube cubemeta:refersToOutputFile ?of.
  ?subject outchl:hasOutputFileId ?of.
  ?subject outchl:refersToOutputChannel ?channel_prefix .
  ?kanta outchl:hasOutputChannelId ?channel_prefix .
  ?kanta outchl:directoryPathRoot ?directoryPathRoot .
  ?kanta a outchl_ont:OutputChannel .
    filter(?language ="fi"^^xsd:language)
  filter (lang(?title_fi) = "fi")
  filter ( langMatches(lang(?directoryPathRoot),"fi") )
}

Br jaana M


Re: Combining sparql queries to speed up the calling process ?

2023-05-04 Thread jaanam

Andy Seaborne kirjoitti 4.5.2023 00:24:

Do you have to use FROM in the second query?


I don't know how to present it because in the 2nd query I'm querying 
three named graphs, where the third one ($STAT) should be replaced with 
the result of the 1st query (?ng)


Br, Jaana



On 03/05/2023 17:58, jaa...@kolumbus.fi wrote:

Hello,

I have the two queries below which I run from my code so that the 1st 
query returns about 3000  ?ng and ?t_id pairs which will then be used 
in the second query in the place of $STAT and $RDF_ID. So I'm calling 
the second query in a loop about 3000 times.


I've noticed that it is time consuming; it takes about 4-5 minutes. 
How could I combine the queries so that I could get all the 
information by just one call ?


$PREFIXS
     SELECT *
     WHERE
   {
   {
    GRAPH stat:outputchannel
     {
     ?subject outchl:refersToTable ?t_id .
    ?subject outchl:refersToNamedGraph ?ng .
   ?subject outchl:hasOutputFileId ?o_id.
     filter (regex(?o_id,"of_001"))
     }
     }
    }


and

$PREFIXS

    SELECT *
    FROM stat:outputchannel
    FROM stat:tabAdmin
    FROM tilasto:$STAT
    WHERE {
  ?pxfile pxt:tableId "$RDF_ID".
    ?pxfile pxt:hasStatus ?status.
  ?pxfile pxt:hasFrequency ?frequency.
  ?pxfile pxt:isPresentationOf ?cube.
  ?cube dc:description ?title_fi.
  ?cube dc:language ?language.
  ?cube cubemeta:hasStatisticalProgramme ?statisticalProgram.
  ?cube cubemeta:lastUpdated ?lastUpdated.
  ?cube cubemeta:refersToOutputFile ?of.
  ?subject outchl:hasOutputFileId ?of.
  ?subject outchl:refersToOutputChannel ?channel_prefix .
  ?kanta outchl:hasOutputChannelId ?channel_prefix .
  ?kanta outchl:directoryPathRoot ?directoryPathRoot .
  ?kanta a outchl_ont:OutputChannel .
    filter(?language ="fi"^^xsd:language)
  filter (lang(?title_fi) = "fi")
  filter ( langMatches(lang(?directoryPathRoot),"fi") )
}

Br jaana M


Combining sparql queries to speed up the calling process ?

2023-05-03 Thread jaanam

Hello,

I have the two queries below which I run from my code so that the 1st 
query returns about 3000  ?ng and ?t_id pairs which will then be used in 
the second query in the place of $STAT and $RDF_ID. So I'm calling the 
second query in a loop about 3000 times.


I've noticed that it is time consuming; it takes about 4-5 minutes. How 
could I combine the queries so that I could get all the information by 
just one call ?


$PREFIXS
SELECT *
WHERE
  {
  {
   GRAPH stat:outputchannel
{
?subject outchl:refersToTable ?t_id .
?subject outchl:refersToNamedGraph ?ng .
?subject outchl:hasOutputFileId ?o_id.
filter (regex(?o_id,"of_001"))
}
}
   }


and

$PREFIXS

   SELECT *
   FROM stat:outputchannel
   FROM stat:tabAdmin
   FROM tilasto:$STAT
   WHERE {
 ?pxfile pxt:tableId "$RDF_ID".
 ?pxfile pxt:hasStatus ?status.
 ?pxfile pxt:hasFrequency ?frequency.
 ?pxfile pxt:isPresentationOf ?cube.
 ?cube dc:description ?title_fi.
 ?cube dc:language ?language.
 ?cube cubemeta:hasStatisticalProgramme ?statisticalProgram.
 ?cube cubemeta:lastUpdated ?lastUpdated.
 ?cube cubemeta:refersToOutputFile ?of.
 ?subject outchl:hasOutputFileId ?of.
 ?subject outchl:refersToOutputChannel ?channel_prefix .
 ?kanta outchl:hasOutputChannelId ?channel_prefix .
 ?kanta outchl:directoryPathRoot ?directoryPathRoot .
 ?kanta a outchl_ont:OutputChannel .
 filter(?language ="fi"^^xsd:language)
 filter (lang(?title_fi) = "fi")
 filter ( langMatches(lang(?directoryPathRoot),"fi") )
}

Br jaana M


Atomic sparql Insert

2023-01-03 Thread jaanam
Hello, I've about 10 000 variables in my Jena Fuseki v. 3.7.0 database.
All of them created using the INSERT DATA-command below.

I just noticed that four of those ~10 000 entries are missing the third
triplet:

_gsimsf:EnumeratedValueDomain\/$TECH_NAME a
gsimsf_ont:EnumeratedValueDomain ;_
_ gsimsf:hasStatId
"$STAT_ID" ._

I've been considering INSERT commands as atomic, but now it looks like
some of them had been disturbed, can it be possible ? And if yes, how
can this be fixed ?

$PREFIXS

INSERT DATA
{ GRAPH tilasto:$STAT
{
cubemeta:CubeMeta\/$RDF_ID cubemeta:hasVariable
gsimsf:Variable\/$TECH_NAME .

gsimsf:Variable\/$TECH_NAME a gsimsf_ont:Variable ;
rdfs:label "$TITLE_FI"@fi ;
rdfs:label "$TITLE_SV"@sv ;
rdfs:label "$TITLE_EN"@en ;
dc:description "$DESC_FI"@fi ;
dc:description "$DESC_SV"@sv ;
dc:description "$DESC_EN"@en ;
gsimsf:hasEnumeratedValueDomain
gsimsf:EnumeratedValueDomain\/$TECH_NAME .

gsimsf:EnumeratedValueDomain\/$TECH_NAME a
gsimsf_ont:EnumeratedValueDomain ;
gsimsf:hasStatId "$STAT_ID"
.

pxt:PxDimension\/$RDF_ID\/$TECH_NAME a
pxt_ont:PxDimension ;
pxt:isPresentationOfVariable
gsimsf:Variable\/$TECH_NAME ;
pxt:hasSequenceNumber $NRO;
pxt:hasVariableType "$VAR_TYPE"
;
pxt:isHeading
"$PX_DEMENTION"^^xsd:boolean ;
pxt:hasCodedVariable
pxt:PxCodedVariable\/$RDF_ID\/$TECH_NAME .

pxt:PxCodedVariable\/$RDF_ID\/$TECH_NAME a
pxt_ont:PxCodedVariable ;
pxt:hasMap "$MAPS" ;
pxt:hasPxDomain "$DOMAIN" ;
pxt:hasElimination
"$ELIMINATION" ;
pxt:hasScaleType "$SCALETYPE" ;
pxt:prependCode
"$PREPENDCODE"^^xsd:boolean ;
   
pxt:isPresentationOfEnumeratedValueDomain
gsimsf:EnumeratedValueDomain\/$TECH_NAME .

pxt:PxFile\/$RDF_ID pxt:hasPxDimension
pxt:PxDimension\/$RDF_ID\/$TECH_NAME .

}
}

Br, Jaana M

Atomic sparql Insert

2023-01-03 Thread jaanam
Hello, I've about 10 000 variables in my Jena Fuseki v. 3.7.0 database. 
All of them created using the INSERT DATA-command below.


I just noticed that four of those ~10 000 entries are missing the third 
triplet:


gsimsf:EnumeratedValueDomain\/$TECH_NAME a 
gsimsf_ont:EnumeratedValueDomain ;
gsimsf:hasStatId "$STAT_ID" 
.


I've been considering INSERT commands as atomic, but now it looks like 
some of them had been disturbed, can it be possible ? And if yes, how 
can this be fixed ?




$PREFIXS

INSERT DATA
{ GRAPH tilasto:$STAT
{
cubemeta:CubeMeta\/$RDF_ID cubemeta:hasVariable 
gsimsf:Variable\/$TECH_NAME .


gsimsf:Variable\/$TECH_NAME a gsimsf_ont:Variable ;
rdfs:label "$TITLE_FI"@fi ;
rdfs:label "$TITLE_SV"@sv ;
rdfs:label "$TITLE_EN"@en ;
dc:description "$DESC_FI"@fi ;
dc:description "$DESC_SV"@sv ;
dc:description "$DESC_EN"@en ;
gsimsf:hasEnumeratedValueDomain 
gsimsf:EnumeratedValueDomain\/$TECH_NAME .


gsimsf:EnumeratedValueDomain\/$TECH_NAME a 
gsimsf_ont:EnumeratedValueDomain ;
gsimsf:hasStatId "$STAT_ID" 
.


pxt:PxDimension\/$RDF_ID\/$TECH_NAME a 
pxt_ont:PxDimension ;
pxt:isPresentationOfVariable 
gsimsf:Variable\/$TECH_NAME ;

pxt:hasSequenceNumber $NRO;
pxt:hasVariableType "$VAR_TYPE" 
;
pxt:isHeading 
"$PX_DEMENTION"^^xsd:boolean ;
pxt:hasCodedVariable 
pxt:PxCodedVariable\/$RDF_ID\/$TECH_NAME .


pxt:PxCodedVariable\/$RDF_ID\/$TECH_NAME a 
pxt_ont:PxCodedVariable ;

pxt:hasMap "$MAPS" ;
pxt:hasPxDomain "$DOMAIN" ;
pxt:hasElimination 
"$ELIMINATION" ;

pxt:hasScaleType "$SCALETYPE" ;
pxt:prependCode 
"$PREPENDCODE"^^xsd:boolean ;

pxt:isPresentationOfEnumeratedValueDomain 
gsimsf:EnumeratedValueDomain\/$TECH_NAME .


pxt:PxFile\/$RDF_ID pxt:hasPxDimension 
pxt:PxDimension\/$RDF_ID\/$TECH_NAME .


}
}

Br, Jaana M


Re: docker-file for jena-fuseki 4.3.1

2021-12-16 Thread jaanam

Hi,

Which aspect of the UI are you interested in - the query/upload part or 
the database administration?


Both.

Br Jaana

Andy Seaborne kirjoitti 16.12.2021 16:02:

On 16/12/2021 05:34, jaa...@kolumbus.fi wrote:

Hi,

found a docker-file for jena-fuseki 4.3.1 from

https://repo1.maven.org/maven2/org/apache/jena/jena-fuseki-docker/4.3.1/

It is without UI. Would there be coming a Dockerfile with UI, too ?

Br Jaana


Yes. Sometime.

WIP:
https://video.twimg.com/tweet_video/FF4KU5kUUAMKZ-q.mp4

Which aspect of the UI are you interested in - the query/upload part
or the database administration?

Andy

You could use the Dockerfile with the fulljar standalone server which
has the UI.


docker-file for jena-fuseki 4.3.1

2021-12-15 Thread jaanam

Hi,

found a docker-file for jena-fuseki 4.3.1 from

https://repo1.maven.org/maven2/org/apache/jena/jena-fuseki-docker/4.3.1/

It is without UI. Would there be coming a Dockerfile with UI, too ?

Br Jaana


Re: Information about Apache Jena and Log4j2 vulnerability.

2021-12-14 Thread jaanam

Hello,

Sorry for asking stupid question, but I'm not sure it would be enough to 
have just the below setting inside the docker container that runs 
blankdots/jena-fuseki 3.17 image pulled from docker hub.


C:\Users\miettinj>docker exec -it 1a7e   /bin/bash
root@1a7e400c71aa:/jena-fuseki# echo $JVM_ARGS
-Xmx2g -Dlog4j2.formatMsgNoLookups=true
root@1a7e400c71aa:/jena-fuseki#

Or should I also change the run command as explained below ?

Br, Jaana


Andy Seaborne kirjoitti 10.12.2021 16:55:

This message is about the effect of CVE-2021-44228 (log4j2) on Fuseki.

https://nvd.nist.gov/vuln/detail/CVE-2021-44228

Jena ships log4j2 in Fuseki and the command line tools.

The vulnerability of log4j2 does impact Fuseki 3.15 - 3.17, and 4.x.

Remote execution is only possible with older versions of Java.

Java versions Java 8u121 and Java 11.0.1, and later, set
"com.sun.jndi.rmi.object.trustURLCodebase"
and
"com.sun.jndi.cosnaming.object.trustURLCodebase"

to "false" protecting against remote code execution by default.


The workaround of setting "-Dlog4j2.formatMsgNoLookups=true" works
with all affected Fuseki versions:

JVM_ARGS="-Dlog4j2.formatMsgNoLookups=true" ./fuseki-server 


Note that Apache Jena 4.2.0 addresses an unrelated Jena-specific CVE
https://nvd.nist.gov/vuln/detail/CVE-2021-39239

We will release Jena 4.3.1 with upgraded log4j2.

Andy
on behalf of the Jena PMC


Re: difference between 3.13 and 3.17

2021-08-18 Thread jaanam

I'll look some more _sometime_ but to be fair to everyone, it has to
fit around other reports.
As expected in previous e-mails it seems that the problem is in the 
script: just running this update


   # Alter: Variable as subject
   delete{
 graph ?g {
?s_before ?p ?o.
 }
   }
  insert{
graph ?g {
?s_after ?p ?o.
}
  }
  where{
{
  graph ?g { }.
}
graph ?g {
  ?s_before a 
.

  ?s_before ?p ?o.
  
bind(iri(concat("http://www.example.org/rdf/data/gsimsf/Variable/;, 
strafter(str(?g), "http://www.example.org/tilasto/;), "/", 
strafter(str(?s_before),   
"http://www.example.org/rdf/data/gsimsf/Variable/;))) as ?s_after)

}
  };

in the source server leads to different results in in_memory and 
persistent dataset. I just have to understand how to change this part of 
the script according to the hits in the previous discussions!


Thanks a lot for your patiency with this issue !

Jaana

Andy Seaborne kirjoitti 17.8.2021 22:12:

On 17/08/2021 06:54, jaa...@kolumbus.fi wrote:
Hello, you were right, there were still unnecessary graphs in my 
source file. They have been removed in source2.zip.


The difference between datasets 'target_in_memory'(=NG combination was 
done in in_memory dataset) and 'target_persistent' (=NG combination 
was done in persistent dataset) is that 'target_in_memory' dataset 
doesn't have predicate 
 at all.


Unfortunately I my sparql-knowledge is not good (as you musta have 
noticed), but I'm still reponsible for this NG combination stuff and 
the combineNGs.sparql script which was coded by a guy who has left the 
office.


This is going to be difficult.

Aside from the difference between datasets, the WHERE clauses that use
?g inside "GRAPH ?g" are wrong - older versions of Jena execute
incorrectly and it is now fixed. The inner ?g is undefined in the BIND
and that leads some no results.

So - regardless of anything else - that's going to need fixing in your
script and that's going to need verification of the expected answers.

---

The SPARQL script is a series of SPARQL Update statements separated by
a semicolons - bisect to the find the first update statement that
shows a difference.  Chop the last half of them off, see if the
modified scripts produces differences. If yes, repeat. If no, chop the
last quarter off.

Also replacing the first 20 lines with the rewrite I posted makes it
easier - no need for servers, execute with the command line "update"
or "tdbupdate".

Similarly bisecting the data to find a smaller data sample.

Bisecting is easier when the data and intent is understood and I don't
know your application.

Attached also a word document changes.zip in which the changes (data 
missing from 'target_in_memory' dataset) are marked with yellow.


I'll look some more _sometime_ but to be fair to everyone, it has to
fit around other reports.

Andy



Br Jaana


Andy Seaborne kirjoitti 16.8.2021 19:35:
It's a bit smaller but I notice it still has the  graph ?g { } and 
the

data still has a default graph.

Does it really need all that data? I'd be surprised if it takes more
than one subject and it's triples to show  a difference.

There are multiple update steps - which is the first that makes a 
difference?


And which outcome is right and which is wrong?

Not knowing the right answer makes it much harder and much more time
consuming to work out what going on.

The data load is the same as a PUT of the data without the default
graph - so the first 20 lines can be done with a "curl -XPUT".

    Andy

On 16/08/2021 11:09, jaa...@kolumbus.fi wrote:

Hello,

sorry for providing you too big amount of data for reproducing the 
problem.


Here's much smaller set for data source and a bit smaller script for 
combining the NGs.


and the steps to reporoduce:

1) start source apache-jena-fuseki-server on port 3030

2) start target apache-jena-fuseki-3.17.0-server on port 3031

3) unzip and upload the attachment source.zip into source 
apache-jena-fuseki-server on port 3030


curl -XPOST --header 'Content-Type: application/trig' --data-binary 
@source.trig http://localhost:3030/source


4) create one in-memory dataset (e.g. target_in_memory) and one 
persistent (e.g. target_persistent) dataset on target 
apache-jena-fuseki-3.17.0-server running on port 3031


5) update the source apache-jena-fuseki-server and source dataset in 
combine_NGs2.sparql-script id needed


6) run combine_NGs2.sparql-script in in-memory dataset and 
persistent dataset of the target apache-jena-fuseki-3.17.0-server 
for instance in jena-fuseki GUI query tab


or using curl:

curl -i -H "Content-Type: application/sparql-update"  -X POST 
http://localhost:3031/target_in_memory/update --data-binary 
"@./combine_NGs2.sparql"


curl -i -H "Content-Type: application/sparql-update"  -X POST 
http://localhost:3031/target_persistent/update 

Re: difference between 3.13 and 3.17

2021-08-16 Thread jaanam
Hello, you were right, there were still unnecessary graphs in my source 
file. They have been removed in source2.zip.


The difference between datasets 'target_in_memory'(=NG combination was 
done in in_memory dataset) and 'target_persistent' (=NG combination was 
done in persistent dataset) is that 'target_in_memory' dataset doesn't 
have predicate 
 at all.


Unfortunately I my sparql-knowledge is not good (as you musta have 
noticed), but I'm still reponsible for this NG combination stuff and the 
combineNGs.sparql script which was coded by a guy who has left the 
office.


Attached also a word document changes.zip in which the changes (data 
missing from 'target_in_memory' dataset) are marked with yellow.


Br Jaana


Andy Seaborne kirjoitti 16.8.2021 19:35:

It's a bit smaller but I notice it still has the  graph ?g { } and the
data still has a default graph.

Does it really need all that data? I'd be surprised if it takes more
than one subject and it's triples to show  a difference.

There are multiple update steps - which is the first that makes a 
difference?


And which outcome is right and which is wrong?

Not knowing the right answer makes it much harder and much more time
consuming to work out what going on.

The data load is the same as a PUT of the data without the default
graph - so the first 20 lines can be done with a "curl -XPUT".

Andy

On 16/08/2021 11:09, jaa...@kolumbus.fi wrote:

Hello,

sorry for providing you too big amount of data for reproducing the 
problem.


Here's much smaller set for data source and a bit smaller script for 
combining the NGs.


and the steps to reporoduce:

1) start source apache-jena-fuseki-server on port 3030

2) start target apache-jena-fuseki-3.17.0-server on port 3031

3) unzip and upload the attachment source.zip into source 
apache-jena-fuseki-server on port 3030


curl -XPOST --header 'Content-Type: application/trig' --data-binary 
@source.trig http://localhost:3030/source


4) create one in-memory dataset (e.g. target_in_memory) and one 
persistent (e.g. target_persistent) dataset on target 
apache-jena-fuseki-3.17.0-server running on port 3031


5) update the source apache-jena-fuseki-server and source dataset in 
combine_NGs2.sparql-script id needed


6) run combine_NGs2.sparql-script in in-memory dataset and persistent 
dataset of the target apache-jena-fuseki-3.17.0-server for instance in 
jena-fuseki GUI query tab


or using curl:

curl -i -H "Content-Type: application/sparql-update"  -X POST 
http://localhost:3031/target_in_memory/update --data-binary 
"@./combine_NGs2.sparql"


curl -i -H "Content-Type: application/sparql-update"  -X POST 
http://localhost:3031/target_persistent/update --data-binary 
"@./combine_NGs2.sparql"


Compare the resutls. Even form jena-fuseki GUI edit-page when opening 
the result default graps in the editor tab window it can be seen that 
the in memory data set has less data than the persistent one.


As I've told before this issue didn't occur with 
apacahe-jane-fuseki-3.13.


And about your questions:


Where is the 3.13.0 server?


To notice that this doesn't happen with 3.13, just replace the 
3.17-server with 3.17 in my above steps. I my staps - I guess - the 
source sever can be 3.13 or 3.17.




Does it need the data pulled from another server that than execute on
already loaded data?


We didn't manage to combine the NGs within one server - we expected 
that jena would try to use proxy or something like that...


Br, Jaana

Andy Seaborne kirjoitti 13.8.2021 23:02:

On 13/08/2021 12:03, jaa...@kolumbus.fi wrote:

Andy Seaborne kirjoitti 11.8.2021 17:38:

Hi there,

There isn't enough information to see what's happening.


Hello,

see steps to repeat the issue below.


I've got all the parts of the example - it's not minimal though.

What is a short amount of data, shorter update script that shows the 
problem?


Does it need the data pulled from another server that than execute on
already loaded data?

There is a data file of 363,559 quads (which has warnings), a SPARQL
update script of 241 lines.

To work out what is going on, someone has to reduce that large setup
to the part that causes the difference.

The first thing that script does is delete all the local data and 
pull

some, not all, data from the source server. Is that step necessary?

I don't believe it needs all the data and all the script to show a
difference nor that it needs to pull the data out of one server, and
put it in the local store in order to be different, why not just load
something directly?


The rest of the update does some kind of manipulation of the data - I
don't understand what it is trying to do - its purpose relates the to
data model.

You are in a much better place to reduce that large script to a
minimal one that shows a difference because it's your application.

Does it need all those steps together to show the difference or just
one of them?  (BTW each 

Re: difference between 3.13 and 3.17

2021-08-16 Thread jaanam

just fixing typo in my below e-mail


To notice that this doesn't happen with 3.13, just replace the
3.17-server with 3.13 in my reproducion steps. I my steps - I guess - 
the

source sever can be 3.13 or 3.17.


jaana

jaa...@kolumbus.fi kirjoitti 16.8.2021 13:09:

Hello,

sorry for providing you too big amount of data for reproducing the 
problem.


Here's much smaller set for data source and a bit smaller script for
combining the NGs.

and the steps to reporoduce:

1) start source apache-jena-fuseki-server on port 3030

2) start target apache-jena-fuseki-3.17.0-server on port 3031

3) unzip and upload the attachment source.zip into source
apache-jena-fuseki-server on port 3030

curl -XPOST --header 'Content-Type: application/trig' --data-binary
@source.trig http://localhost:3030/source

4) create one in-memory dataset (e.g. target_in_memory) and one
persistent (e.g. target_persistent) dataset on target
apache-jena-fuseki-3.17.0-server running on port 3031

5) update the source apache-jena-fuseki-server and source dataset in
combine_NGs2.sparql-script id needed

6) run combine_NGs2.sparql-script in in-memory dataset and persistent
dataset of the target apache-jena-fuseki-3.17.0-server for instance in
jena-fuseki GUI query tab

or using curl:

curl -i -H "Content-Type: application/sparql-update"  -X POST
http://localhost:3031/target_in_memory/update --data-binary
"@./combine_NGs2.sparql"

curl -i -H "Content-Type: application/sparql-update"  -X POST
http://localhost:3031/target_persistent/update --data-binary
"@./combine_NGs2.sparql"

Compare the resutls. Even form jena-fuseki GUI edit-page when opening
the result default graps in the editor tab window it can be seen that
the in memory data set has less data than the persistent one.

As I've told before this issue didn't occur with 
apacahe-jane-fuseki-3.13.


And about your questions:


Where is the 3.13.0 server?


To notice that this doesn't happen with 3.13, just replace the
3.17-server with 3.17 in my above steps. I my staps - I guess - the
source sever can be 3.13 or 3.17.



Does it need the data pulled from another server that than execute on
already loaded data?


We didn't manage to combine the NGs within one server - we expected
that jena would try to use proxy or something like that...

Br, Jaana

Andy Seaborne kirjoitti 13.8.2021 23:02:

On 13/08/2021 12:03, jaa...@kolumbus.fi wrote:

Andy Seaborne kirjoitti 11.8.2021 17:38:

Hi there,

There isn't enough information to see what's happening.


Hello,

see steps to repeat the issue below.


I've got all the parts of the example - it's not minimal though.

What is a short amount of data, shorter update script that shows the 
problem?


Does it need the data pulled from another server that than execute on
already loaded data?

There is a data file of 363,559 quads (which has warnings), a SPARQL
update script of 241 lines.

To work out what is going on, someone has to reduce that large setup
to the part that causes the difference.

The first thing that script does is delete all the local data and pull
some, not all, data from the source server. Is that step necessary?

I don't believe it needs all the data and all the script to show a
difference nor that it needs to pull the data out of one server, and
put it in the local store in order to be different, why not just load
something directly?


The rest of the update does some kind of manipulation of the data - I
don't understand what it is trying to do - its purpose relates the to
data model.

You are in a much better place to reduce that large script to a
minimal one that shows a difference because it's your application.

Does it need all those steps together to show the difference or just
one of them?  (BTW each update step is done independency: there'll be
a point where the answers start diverging.)

Looking at it though, the use of

where{
  {
graph ?g { }.
  }
  graph ?g {
.. some pattern ..
.. some BIND involving ?g ..
 }
}

is pretty suspect.
Omit the first part and put the BIND after the second:

Move the BIND to after the
  graph ?g {
.. some pattern ..
 }
 .. some BIND involving ?g ..

Andy






Where is the 3.13.0 server?


1) start source apache-jena-fuseki-server on port 3030

2) start target apache-jena-fuseki-3.17.0-server on port 3031

3) unzip and upload the attachment ds.zip into source 
apache-jena-fuseki-server on port 3030 using command


  curl -XPOST --header 'Content-Type: application/trig' 
--data-binary @ds.trig http://apache-jena-fuseki-server>:3030/


4) create one in-memory dataset and one persistent dataset on target 
apache-jena-fuseki-3.17.0-server


5) update the source apache-jena-fuseki-server and source dataset in 
combine_NGs.sparql-file


6) run combine_NGs.sparql-script in in-memory dataset and persistent 
dataset of the target apache-jena-fuseki-3.17.0-server


Run how?



7) run query

    SELECT ?subject ?predicate ?object
  WHERE {
   ?subject ?predicate ?object
  }


Re: difference between 3.13 and 3.17

2021-08-16 Thread jaanam

Hello,

sorry for providing you too big amount of data for reproducing the 
problem.


Here's much smaller set for data source and a bit smaller script for 
combining the NGs.


and the steps to reporoduce:

1) start source apache-jena-fuseki-server on port 3030

2) start target apache-jena-fuseki-3.17.0-server on port 3031

3) unzip and upload the attachment source.zip into source 
apache-jena-fuseki-server on port 3030


curl -XPOST --header 'Content-Type: application/trig' --data-binary 
@source.trig http://localhost:3030/source


4) create one in-memory dataset (e.g. target_in_memory) and one 
persistent (e.g. target_persistent) dataset on target 
apache-jena-fuseki-3.17.0-server running on port 3031


5) update the source apache-jena-fuseki-server and source dataset in 
combine_NGs2.sparql-script id needed


6) run combine_NGs2.sparql-script in in-memory dataset and persistent 
dataset of the target apache-jena-fuseki-3.17.0-server for instance in 
jena-fuseki GUI query tab


or using curl:

curl -i -H "Content-Type: application/sparql-update"  -X POST 
http://localhost:3031/target_in_memory/update --data-binary 
"@./combine_NGs2.sparql"


curl -i -H "Content-Type: application/sparql-update"  -X POST 
http://localhost:3031/target_persistent/update --data-binary 
"@./combine_NGs2.sparql"


Compare the resutls. Even form jena-fuseki GUI edit-page when opening 
the result default graps in the editor tab window it can be seen that 
the in memory data set has less data than the persistent one.


As I've told before this issue didn't occur with 
apacahe-jane-fuseki-3.13.


And about your questions:


Where is the 3.13.0 server?


To notice that this doesn't happen with 3.13, just replace the 
3.17-server with 3.17 in my above steps. I my staps - I guess - the 
source sever can be 3.13 or 3.17.




Does it need the data pulled from another server that than execute on
already loaded data?


We didn't manage to combine the NGs within one server - we expected that 
jena would try to use proxy or something like that...


Br, Jaana

Andy Seaborne kirjoitti 13.8.2021 23:02:

On 13/08/2021 12:03, jaa...@kolumbus.fi wrote:

Andy Seaborne kirjoitti 11.8.2021 17:38:

Hi there,

There isn't enough information to see what's happening.


Hello,

see steps to repeat the issue below.


I've got all the parts of the example - it's not minimal though.

What is a short amount of data, shorter update script that shows the 
problem?


Does it need the data pulled from another server that than execute on
already loaded data?

There is a data file of 363,559 quads (which has warnings), a SPARQL
update script of 241 lines.

To work out what is going on, someone has to reduce that large setup
to the part that causes the difference.

The first thing that script does is delete all the local data and pull
some, not all, data from the source server. Is that step necessary?

I don't believe it needs all the data and all the script to show a
difference nor that it needs to pull the data out of one server, and
put it in the local store in order to be different, why not just load
something directly?


The rest of the update does some kind of manipulation of the data - I
don't understand what it is trying to do - its purpose relates the to
data model.

You are in a much better place to reduce that large script to a
minimal one that shows a difference because it's your application.

Does it need all those steps together to show the difference or just
one of them?  (BTW each update step is done independency: there'll be
a point where the answers start diverging.)

Looking at it though, the use of

where{
  {
graph ?g { }.
  }
  graph ?g {
.. some pattern ..
.. some BIND involving ?g ..
 }
}

is pretty suspect.
Omit the first part and put the BIND after the second:

Move the BIND to after the
  graph ?g {
.. some pattern ..
 }
 .. some BIND involving ?g ..

Andy






Where is the 3.13.0 server?


1) start source apache-jena-fuseki-server on port 3030

2) start target apache-jena-fuseki-3.17.0-server on port 3031

3) unzip and upload the attachment ds.zip into source 
apache-jena-fuseki-server on port 3030 using command


  curl -XPOST --header 'Content-Type: application/trig' 
--data-binary @ds.trig http://apache-jena-fuseki-server>:3030/


4) create one in-memory dataset and one persistent dataset on target 
apache-jena-fuseki-3.17.0-server


5) update the source apache-jena-fuseki-server and source dataset in 
combine_NGs.sparql-file


6) run combine_NGs.sparql-script in in-memory dataset and persistent 
dataset of the target apache-jena-fuseki-3.17.0-server


Run how?



7) run query

    SELECT ?subject ?predicate ?object
  WHERE {
   ?subject ?predicate ?object
  }

in in-memory dataset and persistent dataset of the target 
apache-jena-fuseki-3.17.0-server and compare the results.


See attachements in_memory.png and persistent.png for my results after 
the above procedure.


That's screenshots of a 

Re: difference between 3.13 and 3.17

2021-08-13 Thread jaanam

Andy Seaborne kirjoitti 11.8.2021 17:38:

Hi there,

There isn't enough information to see what's happening.



Hello,

I don't know it this message was already received by the recipients as I 
tried to send 2 MB file as an attachment. Sorry for the inconvenience in 
that case !


Anyway, now the file has been stored in github, so pls see the steps to 
repeat the issue below.




1) start source apache-jena-fuseki-server on port 3030

2) start target apache-jena-fuseki-3.17.0-server on port 3031

3) upload the file ds.trig from https://github.com/jamietti/jena into 
source apache-jena-fuseki-server on port 3030 using command


  curl -XPOST --header 'Content-Type: application/trig' 
--data-binary @ds.trig http://apache-jena-fuseki-server>:3030/


4) create one in-memory dataset and one persistent dataset on target 
apache-jena-fuseki-3.17.0-server


5) update the source apache-jena-fuseki-server and source dataset in 
combine_NGs.sparql-file


6) run combine_NGs.sparql-script in in-memory dataset and persistent 
dataset of the target apache-jena-fuseki-3.17.0-server


7) run query

SELECT ?subject ?predicate ?object
  WHERE {
   ?subject ?predicate ?object
  }

in in-memory dataset and persistent dataset of the target 
apache-jena-fuseki-3.17.0-server and compare the results.


See attachements in_memory.png and persistent.png for my results after 
the above procedure.


Jaana




The first thing to do is dump, or Fuseki backup, the database from
each setup and see if they are the same.

Then if they are, send a minimal reproducible example [1].
Something someone else can run.

Andy

[1]
https://stackoverflow.com/help/minimal-reproducible-example


On 11/08/2021 13:35, jaa...@kolumbus.fi wrote:

Hello,

My jena-fuseki database consists of several named graphs. In order to 
provide users graphql-like interface to jena-fuseki I have to combine 
my NGs into one big default graph for HyperGraphql 
(https://www.hypergraphql.org/) that provides the interface.


At some point the users started to get less data than before and when 
I investigated the issue I noticed that this was after upgrading 
jena-fuseki from 3.13 to 3.17 !


To combine the NGs I'm using the following command:

   curl -i -H "Content-Type: application/sparql-update"  -X POST 
http://:8061//update --data-binary 
"@./combine-NGs.sparql"


(see NGs_to_be_combined.txt for hint of my source database and 
combine-NGs.sparql as the executed script).


So I run the combine-NGs.sparql-script in the target jena host and the 
target dataset.


If I use in-memory dataset as my target dataset I get only half of the 
triplets cmapared to the amount of triplets with persistent dataset. 
This happens only with jena-fuseki 3.17.


In 3.13 I haven't seen this issue!

Br, Jaana




drop all;

# Copy all NGs into local
insert{
  graph ?g {
?s ?p ?o
  }
}
where{
  service :3030//query> {# original RDF-dataset
{
  graph ?g { }.
}
graph ?g {
  ?s ?p ?o
}
  }
};

# Alter: Variable as object
delete{
  graph ?g {
?s ?p ?o_before.
  }
}
insert{
  graph ?g {
?s ?p ?o_after.
  }
}
where{
  {
graph ?g { }.
  }
  graph ?g {
?o_before a .
?s ?p ?o_before.
bind(iri(concat("http://www.example.org/rdf/data/gsimsf/Variable/;, 
strafter(str(?g), "http://www.example.org/tilasto/;), "/", 
strafter(str(?o_before), "http://www.example.org/rdf/data/gsimsf/Variable/;))) 
as ?o_after)
  }
};

# Alter: Variable as subject
delete{
  graph ?g {
?s_before ?p ?o.
  }
}
insert{
  graph ?g {
?s_after ?p ?o.
  }
}
where{
  {
graph ?g { }.
  }
  graph ?g {
?s_before a .
?s_before ?p ?o.
bind(iri(concat("http://www.example.org/rdf/data/gsimsf/Variable/;, 
strafter(str(?g), "http://www.example.org/tilasto/;), "/", 
strafter(str(?s_before), "http://www.example.org/rdf/data/gsimsf/Variable/;))) 
as ?s_after)
  }
};


# Alter: EnumeratedValueDomain as object
delete{
  graph ?g {
?s ?p ?o_before.
  }
}
insert{
  graph ?g {
?s ?p ?o_after.
  }
}
where{
  {
graph ?g { }.
  }
  graph ?g {
?o_before a 
.
?s ?p ?o_before.

bind(iri(concat("http://www.example.org/rdf/data/gsimsf/EnumeratedValueDomain/;,
 strafter(str(?g), "http://www.example.org/tilasto/;), "/", 
strafter(str(?o_before), 
"http://www.example.org/rdf/data/gsimsf/EnumeratedValueDomain/;))) as ?o_after)
  }
};

# Alter: EnumeratedValueDomain as subject
delete{
  graph ?g {
?s_before ?p ?o.
  }
}
insert{
  graph ?g {
?s_after ?p ?o.
  }
}
where{
  {
graph ?g { }.
  }
  graph ?g {
?s_before a 
.
?s_before ?p ?o.

bind(iri(concat("http://www.example.org/rdf/data/gsimsf/EnumeratedValueDomain/;,
 strafter(str(?g), 

jena-fuseki 3.17 wiped out some data

2021-08-11 Thread jaanam

Hello,

My jena-fuseki database consists of several named graphs. In order to 
provide users graphql-like interface to jena-fuseki I have to combine my 
NGs into one big default graph for HyperGraphql 
(https://www.hypergraphql.org/) that provides the interface.


At some point the users started to get less data than before and when I 
investigated the issue I noticed that this was after upgrading 
jena-fuseki from 3.13 to 3.17 !


To combine the NGs I'm using the following command:

  curl -i -H "Content-Type: application/sparql-update"  -X POST 
http://:8061//update --data-binary 
"@./combine-NGs.sparql"


(see NGs_to_be_combined.txt for hint of my source database and 
combine-NGs.sparql as the executed script).


So I run the combine-NGs.sparql-script in the target jena host and the 
target dataset.


If I use in-memory dataset as my target dataset I get only half of the 
triplets cmapared to the amount of triplets with persistent dataset. 
This happens only with jena-fuseki 3.17.


In 3.13 I haven't seen this issue!

Br, Jaana




drop all;

# Copy all NGs into local
insert{
  graph ?g {
?s ?p ?o
  }
}
where{
  service :8061//query> {# original 
RDF-dataset
{
  graph ?g { }.
}
graph ?g {
  ?s ?p ?o
}
  }
};

# Alter: Variable as object
delete{
  graph ?g {
?s ?p ?o_before.
  }
}
insert{
  graph ?g {
?s ?p ?o_after.
  }
}
where{
  {
graph ?g { }.
  }
  graph ?g {
?o_before a .
?s ?p ?o_before.
bind(iri(concat("http://www./rdf/data/gsimsf/Variable/", 
strafter(str(?g), "http://www./tilasto/"), "/", strafter(str(?o_before), 
"http://www./rdf/data/gsimsf/Variable/"))) as ?o_after)
  }
};

# Alter: Variable as subject
delete{
  graph ?g {
?s_before ?p ?o.
  }
}
insert{
  graph ?g {
?s_after ?p ?o.
  }
}
where{
  {
graph ?g { }.
  }
  graph ?g {
?s_before a .
?s_before ?p ?o.
bind(iri(concat("http://www./rdf/data/gsimsf/Variable/", 
strafter(str(?g), "http://www./tilasto/"), "/", strafter(str(?s_before), 
"http://www./rdf/data/gsimsf/Variable/"))) as ?s_after)
  }
};


# Alter: EnumeratedValueDomain as object
delete{
  graph ?g {
?s ?p ?o_before.
  }
}
insert{
  graph ?g {
?s ?p ?o_after.
  }
}
where{
  {
graph ?g { }.
  }
  graph ?g {
?o_before a 
.
?s ?p ?o_before.

bind(iri(concat("http://www./rdf/data/gsimsf/EnumeratedValueDomain/", 
strafter(str(?g), "http://www./tilasto/"), "/", strafter(str(?o_before), 
"http://www./rdf/data/gsimsf/EnumeratedValueDomain/"))) as ?o_after)
  }
};

# Alter: EnumeratedValueDomain as subject
delete{
  graph ?g {
?s_before ?p ?o.
  }
}
insert{
  graph ?g {
?s_after ?p ?o.
  }
}
where{
  {
graph ?g { }.
  }
  graph ?g {
?s_before a 
.
?s_before ?p ?o.

bind(iri(concat("http://www./rdf/data/gsimsf/EnumeratedValueDomain/", 
strafter(str(?g), "http://www./tilasto/"), "/", strafter(str(?s_before), 
"http://www./rdf/data/gsimsf/EnumeratedValueDomain/"))) as ?s_after)
  }
};

# Alter: DescribedValueDomain as object
delete{
  graph ?g {
?s ?p ?o_before.
  }
}
insert{
  graph ?g {
?s ?p ?o_after.
  }
}
where{
  {
graph ?g { }.
  }
  graph ?g {
?o_before a .
?s ?p ?o_before.
bind(iri(concat("http://www./rdf/data/gsimsf/DescribedValueDomain/", 
strafter(str(?g), "http://www./tilasto/"), "/", strafter(str(?o_before), 
"http://www./rdf/data/gsimsf/DescribedValueDomain/"))) as ?o_after)
  }
};

# Alter: DescribedValueDomain as subject
delete{
  graph ?g {
?s_before ?p ?o.
  }
}
insert{
  graph ?g {
?s_after ?p ?o.
  }
}
where{
  {
graph ?g { }.
  }
  graph ?g {
?s_before a .
?s_before ?p ?o.
bind(iri(concat("http://www./rdf/data/gsimsf/DescribedValueDomain/", 
strafter(str(?g), "http://www./tilasto/"), "/", strafter(str(?s_before), 
"http://www./rdf/data/gsimsf/DescribedValueDomain/"))) as ?s_after)
  }
};


# Add ?CodedVariable 
 
?Variable
insert{
  graph ?g {
?CodedVariable 
 
?Variable
  }
}
where{
  {
graph ?g { }.
  }
  graph ?g {
?Variable a ;
   
?EnumeratedValueDomain.
?CodedVariable a  ;
   
?EnumeratedValueDomain.
  }
};

# Add ?NumericalVariable 
 
?Variable
insert{
  graph ?g {
?NumericalVariable 

Re: Fwd: Re: updating persistent jena-fuseki dataset increases memory consumption in gigas

2021-04-09 Thread jaanam

Hi,

Could you suggest an optimal jena-fuseki heap size for my case ?  I'm 
sending 50 MBs file to my jena-fuseki memory-based dataset every 5 
minutes.


Jaana

(and should this be set to JVM actually ?)

jaa...@kolumbus.fi kirjoitti 8.4.2021 18:03:

Hello,

Still one question regarding this old issue. The previous answer said:


The heap size by default is quite small in the scripts. It might be an
idea to increase it a bit to give query working space but 0.5 million
is really not very big.


What would be the suitable heap size in my case ?
(And then very stupid additional question: If I'm running JVM jand
jena-fuseki in the same docker container, there's a risk that JVM
would take all free memory, thus I've set the JVM heap size to 2 G
using JVM_ARGS=-Xmx2g. So, which variables should I use to set the
heap size for jena-fuseki ? )

Br, Jaana

Andy Seaborne kirjoitti 10.3.2021 17:04:

On 10/03/2021 02:33, jaa...@kolumbus.fi wrote:

Hi, Thanks for your quick anserwer and pls see my answers below!


How many triples?
And is is new data to replace the old data or in addition to the 
existing data?


476955 triplets, most parts will bu just same as the old data, just 
some triplets may change. And some new triplets may be added.



This is a TDB1 database?


jena-fuseki UI does not mention TDB1, but this is persistent and not 
TDB2.


But in our use case also memory-based datasets might work, as far as 
I've been testing in my PC they seem to work even better than 
persistent ones. What do you think ?


In-memory should be fine. Obviously, its lost when the server exits
but it sounds like the data isn't the primary copy and loading 476955
triples at start up is not big.

The heap size by default is quite small in the scripts. It might be an
idea to increase it a bit to give query working space but 0.5 million
is really not very big.

Andy



Br Jaana



Andy Seaborne kirjoitti 9.3.2021 19:58:

Hi Jaana,

On 09/03/2021 11:40, jaa...@kolumbus.fi wrote:

hello,

I've met the following problem with jena-fuseki (should I create 
bug ticket ?):


We need to update jena-fuseki dataset every 5 minutes by a 50 
Mbytes ttl-file.


How many triples?
And is is new data to replace the old data or in addition to the 
existing data?


This causes the memory consumption in the machine where jena-fuseki 
is running to increase by gigas.


This was 1st detected with jena-fuseki 3.8 and later with 
jena-fuseki 3.17.


To be exact I executed blankdots/jena-fuseki:fuseki3.17.0 in a 
docker container posting continously that ttl-file into the same 
dataset (pxmeta_hub_fed_prod).


This is a TDB1 database?

TDB2 is better at this - the database still grows but there is a way
to compact the database live.

JENA-1987 exposes the compaction in Fuseki.
https://jena.apache.org/documentation/tdb2/tdb2_admin.html

The database grows for two reasons: it allocates space in sparse 
files

in 8M chunks but the space does not count in du until actually used.
The space for deleted data is not fully recycled across transactions
because it may be in-use in a concurrent operation. (TDB1 would be
very difficult to do block ref counting; in TDB2 the solution is
compaction.)

    Andy



see the output of command "du -h | sort -hr|head -30" below. 
attached the shell-script that I was executing during the time 
period.


root@3d53dc3fdf8d:/#alias du3="du -h | sort -hr|head -30"
root@3d53dc3fdf8d:/# du3
9.0G    .
8.5G    ./data/fuseki/databases/pxmeta_hub_fed_prod
8.5G    ./data/fuseki/databases
8.5G    ./data/fuseki
8.5G    ./data



root@3d53dc3fdf8d:/# date
Tue Mar  9 06:02:46 UTC 2021
root@3d53dc3fdf8d:/#




3.5G    .
3.0G    ./data/fuseki/databases/pxmeta_hub_fed_prod
3.0G    ./data/fuseki/databases
3.0G    ./data/fuseki
3.0G    ./data
root@3d53dc3fdf8d:/# date
Tue Mar  9 05:28:09 UTC 2021
root@3d53dc3fdf8d:/#

Br, Jaana


Fwd: Re: updating persistent jena-fuseki dataset increases memory consumption in gigas

2021-04-08 Thread jaanam



Hello,

Still one question regarding this old issue. The previous answer said:


The heap size by default is quite small in the scripts. It might be an
idea to increase it a bit to give query working space but 0.5 million
is really not very big.


What would be the suitable heap size in my case ?
(And then very stupid additional question: If I'm running JVM jand 
jena-fuseki in the same docker container, there's a risk that JVM would 
take all free memory, thus I've set the JVM heap size to 2 G using 
JVM_ARGS=-Xmx2g. So, which variables should I use to set the heap size 
for jena-fuseki ? )


Br, Jaana

Andy Seaborne kirjoitti 10.3.2021 17:04:

On 10/03/2021 02:33, jaa...@kolumbus.fi wrote:

Hi, Thanks for your quick anserwer and pls see my answers below!


How many triples?
And is is new data to replace the old data or in addition to the 
existing data?


476955 triplets, most parts will bu just same as the old data, just 
some triplets may change. And some new triplets may be added.



This is a TDB1 database?


jena-fuseki UI does not mention TDB1, but this is persistent and not 
TDB2.


But in our use case also memory-based datasets might work, as far as 
I've been testing in my PC they seem to work even better than 
persistent ones. What do you think ?


In-memory should be fine. Obviously, its lost when the server exits
but it sounds like the data isn't the primary copy and loading 476955
triples at start up is not big.

The heap size by default is quite small in the scripts. It might be an
idea to increase it a bit to give query working space but 0.5 million
is really not very big.

Andy



Br Jaana



Andy Seaborne kirjoitti 9.3.2021 19:58:

Hi Jaana,

On 09/03/2021 11:40, jaa...@kolumbus.fi wrote:

hello,

I've met the following problem with jena-fuseki (should I create bug 
ticket ?):


We need to update jena-fuseki dataset every 5 minutes by a 50 Mbytes 
ttl-file.


How many triples?
And is is new data to replace the old data or in addition to the 
existing data?


This causes the memory consumption in the machine where jena-fuseki 
is running to increase by gigas.


This was 1st detected with jena-fuseki 3.8 and later with 
jena-fuseki 3.17.


To be exact I executed blankdots/jena-fuseki:fuseki3.17.0 in a 
docker container posting continously that ttl-file into the same 
dataset (pxmeta_hub_fed_prod).


This is a TDB1 database?

TDB2 is better at this - the database still grows but there is a way
to compact the database live.

JENA-1987 exposes the compaction in Fuseki.
https://jena.apache.org/documentation/tdb2/tdb2_admin.html

The database grows for two reasons: it allocates space in sparse 
files

in 8M chunks but the space does not count in du until actually used.
The space for deleted data is not fully recycled across transactions
because it may be in-use in a concurrent operation. (TDB1 would be
very difficult to do block ref counting; in TDB2 the solution is
compaction.)

    Andy



see the output of command "du -h | sort -hr|head -30" below. 
attached the shell-script that I was executing during the time 
period.


root@3d53dc3fdf8d:/#alias du3="du -h | sort -hr|head -30"
root@3d53dc3fdf8d:/# du3
9.0G    .
8.5G    ./data/fuseki/databases/pxmeta_hub_fed_prod
8.5G    ./data/fuseki/databases
8.5G    ./data/fuseki
8.5G    ./data



root@3d53dc3fdf8d:/# date
Tue Mar  9 06:02:46 UTC 2021
root@3d53dc3fdf8d:/#




3.5G    .
3.0G    ./data/fuseki/databases/pxmeta_hub_fed_prod
3.0G    ./data/fuseki/databases
3.0G    ./data/fuseki
3.0G    ./data
root@3d53dc3fdf8d:/# date
Tue Mar  9 05:28:09 UTC 2021
root@3d53dc3fdf8d:/#

Br, Jaana


Re: updating persistent jena-fuseki dataset increases memory consumption in gigas

2021-03-30 Thread jaanam

Hello,

I've been trying TDB2 with compact. I have 2 TDB2 datasets in my 
jena-fuseki. Both of them are being uploaded by 50 MBs every 5 minutes.


At the same time they are compacted hourly by the attached script.

At some point I start getting thse messages:

  + curl -i -XPOST 'localhost:8061/$/compact/pxmeta_hub_fed_prod'
 % Total% Received % Xferd  Average Speed   TimeTime 
Time  Current
 Dload  Upload   Total   SpentLeft  
Speed


 0 00 00 0  0  0 --:--:-- --:--:-- 
--:--:-- 0
 10059  100590 0  59000  0 --:--:-- --:--:-- 
--:--:-- 59000

 HTTP/1.1 400 Bad Request
 Date: Tue, 30 Mar 2021 23:54:47 GMT
 Fuseki-Request-Id: 2706
 Content-Type: text/plain;charset=utf-8
 Cache-Control: must-revalidate,no-cache,no-store
 Pragma: no-cache
 Content-Length: 59

 Async task request rejected - exceeds the limit of 4 tasks

After this the dataset in question doesn't any more return from queries 
invoked in UI.


What is wrong now ?

Br, Jaana


Andy Seaborne kirjoitti 9.3.2021 19:58:

Hi Jaana,

On 09/03/2021 11:40, jaa...@kolumbus.fi wrote:

hello,

I've met the following problem with jena-fuseki (should I create bug 
ticket ?):


We need to update jena-fuseki dataset every 5 minutes by a 50 Mbytes 
ttl-file.


How many triples?
And is is new data to replace the old data or in addition to the 
existing data?


This causes the memory consumption in the machine where jena-fuseki is 
running to increase by gigas.


This was 1st detected with jena-fuseki 3.8 and later with jena-fuseki 
3.17.


To be exact I executed blankdots/jena-fuseki:fuseki3.17.0 in a docker 
container posting continously that ttl-file into the same dataset 
(pxmeta_hub_fed_prod).


This is a TDB1 database?

TDB2 is better at this - the database still grows but there is a way
to compact the database live.

JENA-1987 exposes the compaction in Fuseki.
https://jena.apache.org/documentation/tdb2/tdb2_admin.html

The database grows for two reasons: it allocates space in sparse files
in 8M chunks but the space does not count in du until actually used.
The space for deleted data is not fully recycled across transactions
because it may be in-use in a concurrent operation. (TDB1 would be
very difficult to do block ref counting; in TDB2 the solution is
compaction.)

Andy



see the output of command "du -h | sort -hr|head -30" below. attached 
the shell-script that I was executing during the time period.


root@3d53dc3fdf8d:/#alias du3="du -h | sort -hr|head -30"
root@3d53dc3fdf8d:/# du3
9.0G    .
8.5G    ./data/fuseki/databases/pxmeta_hub_fed_prod
8.5G    ./data/fuseki/databases
8.5G    ./data/fuseki
8.5G    ./data



root@3d53dc3fdf8d:/# date
Tue Mar  9 06:02:46 UTC 2021
root@3d53dc3fdf8d:/#




3.5G    .
3.0G    ./data/fuseki/databases/pxmeta_hub_fed_prod
3.0G    ./data/fuseki/databases
3.0G    ./data/fuseki
3.0G    ./data
root@3d53dc3fdf8d:/# date
Tue Mar  9 05:28:09 UTC 2021
root@3d53dc3fdf8d:/#

Br, Jaana
#!/bin/sh
set -x
while :
do
sleep 1h
date=$(date)
handle_dataset()
{
	echo "starting to compact" $date $1 >>log.txt
	curl -i -XPOST localhost:3031/$/compact/$1
	cd ~/jena/apache-jena-fuseki-3.17.0/run/databases/$1
	count=$(ls |grep Data- | wc -l)

	for FILE in Data-*; do 	
		if [ $count -gt 1 ] 
		then
			echo "deleting" $FILE >>log.txt
			rm -rf $FILE
			count=$(( count - 1)) 
		fi
	done
}
handle_dataset pxmeta_hub_fed
handle_dataset pxmeta_hub_fed_prod
done



regularly backup jena-fuseki version 3.7.0.

2021-03-29 Thread jaanam

Hello,

I'm running jena-fuseki 3.7.0

I should implement regular backing up for the database, but don't know 
how to do it. Based on the documents found from web I've understood, 
that before 3.17 it wouldn't even be possible to take backups from 
running server. It that true ? If not, plase tell me how to take the 
backup using linux commad line!


Br, Jaana


Re: jena-fuseki's memory consumption keeps on growing

2021-03-23 Thread jaanam

Hi,

Don't undestand. I'm running docker image 
blankdots/jena-fuseki:fuseki3.17.0, that has java 14.0.2. Should it 
include java 12 features, thus also JEP 346 ?


Jaana

Andy Seaborne kirjoitti 22.3.2021 14:51:

Java memory will grow to reach about the heap size. This is a java
thing - it does not perform the more expensive forms of garbage
collection until the heap is nearly full. Then it does a more savage
GC and the memory usage will drop (but not the OS process size).

To check - attach a tool such as VisualVM and ask it to force a GC and
see what the memory graph does.

Returning memory to the OS is JEP 346 (for G1 at least) which is in
Java12 (may be experiemntal there)

Andy

On 22/03/2021 10:59, jaa...@kolumbus.fi wrote:

Hello,

I'm running jena-fuseki 3.17.0 in azure cloud with memory-based 
datasets. I haven't been updating any data into my datasets for one 
week, but I just noticed that the memory consumption still keeps on 
increasing (see attachment).


There's another service that queries data from that 
jena-fuseki-instance every minute via hypergraphql-interface. Could 
jena-fuseki somehow cache those requests causing memory consumption 
increase ?


If so, is there any means for preventing such caching ?

Br, Jaana


copy data from another dataset with SPARQL

2021-03-23 Thread jaanam

Hello,

Lets say I have two datasets in my jena-fuseki-server. I need to copy 
the named graphs from dataset1 to dataset2 where I'll combine them into 
one default graph.


I got that copying work in dataset2 of my test server using the trick 
below:


insert{
  graph ?g {
?s ?p ?o
  }
}
where{
  service :3030//query> {	# original 
RDF-dataset

{
  graph ?g { }.
}
graph ?g {
  ?s ?p ?o
}
  }
};

But I also need to run the same in azure cloud where I get an error 
message of Unauthorized operation. Is there any means to configure the 
needed credentials somewhere for jena or could the same thing (copy 
dataset1 to dataset2) be done alternatively wihout need for permissions 
?


Br Jaana



Re: jena-fuseki's memory consumption keeps on growing

2021-03-22 Thread jaanam

Hello,

thanks for your answer, but my dataset was an in-memory, that's why I 
was confused with its behaviour. But I'll now recerate it and try to 
repeat the issue to be absolutely sure.


Jaana

Rob Vesse kirjoitti 22.3.2021 13:36:

I assume that this is TDB 1?

It is possible you are encountering the scenario detailed at
https://jena.apache.org/documentation/tdb/faqs.html#fuseki-tdb-memory-leak

If the queries are sufficiently frequent the server may never be able
to flush the in-memory journal leading to continuous memory growth
over time.  The above linked FAQ notes how you can diagnose if this is
indeed the case.

If you are impacted by this you would need to occasionally quiesce the
flow of queries to allow the server to flush the journal fully to disk
and free up the memory allocated to it.

Another alternative solution would be to switch over to using TDB 2
instead since that has a different on-disk memory structure that
avoids the need for updates to be kept in an in-memory journal.  The
trade-off there is that it instead uses more disk space because there
are potentially multiple versions of disk on at any one time so you
would need to compact the database regularly -
https://jena.apache.org/documentation/tdb2/tdb2_admin.html - and/or
allocate more disk space to your cloud instances.

Hope this helps,

Rob

On 22/03/2021, 11:00, "jaa...@kolumbus.fi"  wrote:

Hello,

I'm running jena-fuseki 3.17.0 in azure cloud with memory-based
datasets. I haven't been updating any data into my datasets for one
week, but I just noticed that the memory consumption still keeps on
increasing (see attachment).

There's another service that queries data from that 
jena-fuseki-instance
every minute via hypergraphql-interface. Could jena-fuseki somehow 
cache

those requests causing memory consumption increase ?

If so, is there any means for preventing such caching ?

Br, Jaana


jena-fuseki's memory consumption keeps on growing

2021-03-22 Thread jaanam

Hello,

I'm running jena-fuseki 3.17.0 in azure cloud with memory-based 
datasets. I haven't been updating any data into my datasets for one 
week, but I just noticed that the memory consumption still keeps on 
increasing (see attachment).


There's another service that queries data from that jena-fuseki-instance 
every minute via hypergraphql-interface. Could jena-fuseki somehow cache 
those requests causing memory consumption increase ?


If so, is there any means for preventing such caching ?

Br, Jaana

Re: upload several ngs into dataset by curl

2021-03-16 Thread jaanam

hello,

I wasn't sure whether it is a turtle-file or not, but it was created by 
running curl GET. And yes, my problem is that I need to upload dataset 
(including 124 NGs) from one jena-fuseki-server to another using just 
one command.


And thanks for your answer, I'll those SOH-commands, hopefully they 
help,


Jaana


Lorenz Buehmann kirjoitti 16.3.2021 14:08:

Ok, even more confusing, this is a sequence of SPARQL update commands,
right? What exactly is the problem with it? You don't want to run 124
curl calls?

Can't you use SOH commands like s-put [1]

[1] https://jena.apache.org/documentation/fuseki2/soh.html

On 16.03.21 13:04, Lorenz Buehmann wrote:
I'm confused. Why is this a Turtle file? Turtle doesn't contain quads. 
It should be a Trig file with .trig being the file extension.


On 16.03.21 12:58, jaa...@kolumbus.fi wrote:

Hi,

is it possible to upload several NG:s (from a ttl-file) into 
jena-fuseki dataset by just one curl-command ?


I mean this:

    curl -i -H "Content-Type: application/sparql-update"  -X POST 
http://localhost:3030/pxmeta_hub_fed/update --data-binary 
"@test.ttl",


where my test.tll is printed into the end of this e-mail.

The problem is that we have 124 NGs in our system and it is not 
possible to upload them all by separate commads, so not possible to 
define the target graph in the curl command.



   cat test.ttl

   drop all;
   insert data
   {
   
    
    "two" , "one" .

    {
   
    
    "five" , "six" .
   }

    {
   
    
    "three" , "four" .
  }
  }

br, Jaana M


upload several ngs into dataset by curl

2021-03-16 Thread jaanam

Hi,

is it possible to upload several NG:s (from a ttl-file) into jena-fuseki 
dataset by just one curl-command ?


I mean this:

curl -i -H "Content-Type: application/sparql-update"  -X POST 
http://localhost:3030/pxmeta_hub_fed/update --data-binary "@test.ttl",


where my test.tll is printed into the end of this e-mail.

The problem is that we have 124 NGs in our system and it is not possible 
to upload them all by separate commads, so not possible to define the 
target graph in the curl command.



   cat test.ttl

   drop all;
   insert data
   {
   

"two" , "one" .

    {
   

"five" , "six" .
   }

    {
   

"three" , "four" .
  }
  }

br, Jaana M


Re: updating persisten jena-fuseki dataset increases memory consimption in gigas

2021-03-12 Thread jaanam

Hello,
unfortunately also in-memory appeared to be problematic in our use case.

I was running the attached update-script continously for 5 hours in an 
in-memry dataset (pxmeta_hub_fed).


After 5 hours I got this:


 OpenJDK 64-Bit Server VM warning: INFO: 
os::commit_memory(0x0007b9c0, 102760448, 0) failed; error='Not 
enough space' (errno=12)
 
#https://webmail.elisa.fi/?_task=mail&_action=compose&_id=1135307948604b34c97c460#
 # There is insufficient memory for the Java Runtime Environment to 
continue.
 # Native memory allocation (mmap) failed to map 102760448 bytes for 
committing reserved memory.

 # An error report file with more information is saved as:
 # /jena-fuseki/hs_err_pid1.log
 miettinj@sinivalas1:~> docker ps

jena-fuseki was executed in a docker container with 
miettinj/pxpro-jena-fuseki:fuseki3.17.0, which is the same image as 
blankdots/jena-fuseki:fuseki3.17.0 except that the 1st one uses 
JVM_ARGS=-Xmx2g.


So it seems that memory consumption increases also when using in-memory 
data bases.


Would you suggest any idea for fix ?

Br, Jaana


Andy Seaborne kirjoitti 10.3.2021 17:04:

On 10/03/2021 02:33, jaa...@kolumbus.fi wrote:

Hi, Thanks for your quick anserwer and pls see my answers below!


How many triples?
And is is new data to replace the old data or in addition to the 
existing data?


476955 triplets, most parts will bu just same as the old data, just 
some triplets may change. And some new triplets may be added.



This is a TDB1 database?


jena-fuseki UI does not mention TDB1, but this is persistent and not 
TDB2.


But in our use case also memory-based datasets might work, as far as 
I've been testing in my PC they seem to work even better than 
persistent ones. What do you think ?


In-memory should be fine. Obviously, its lost when the server exits
but it sounds like the data isn't the primary copy and loading 476955
triples at start up is not big.

The heap size by default is quite small in the scripts. It might be an
idea to increase it a bit to give query working space but 0.5 million
is really not very big.

Andy



Br Jaana



Andy Seaborne kirjoitti 9.3.2021 19:58:

Hi Jaana,

On 09/03/2021 11:40, jaa...@kolumbus.fi wrote:

hello,

I've met the following problem with jena-fuseki (should I create bug 
ticket ?):


We need to update jena-fuseki dataset every 5 minutes by a 50 Mbytes 
ttl-file.


How many triples?
And is is new data to replace the old data or in addition to the 
existing data?


This causes the memory consumption in the machine where jena-fuseki 
is running to increase by gigas.


This was 1st detected with jena-fuseki 3.8 and later with 
jena-fuseki 3.17.


To be exact I executed blankdots/jena-fuseki:fuseki3.17.0 in a 
docker container posting continously that ttl-file into the same 
dataset (pxmeta_hub_fed_prod).


This is a TDB1 database?

TDB2 is better at this - the database still grows but there is a way
to compact the database live.

JENA-1987 exposes the compaction in Fuseki.
https://jena.apache.org/documentation/tdb2/tdb2_admin.html

The database grows for two reasons: it allocates space in sparse 
files

in 8M chunks but the space does not count in du until actually used.
The space for deleted data is not fully recycled across transactions
because it may be in-use in a concurrent operation. (TDB1 would be
very difficult to do block ref counting; in TDB2 the solution is
compaction.)

    Andy



see the output of command "du -h | sort -hr|head -30" below. 
attached the shell-script that I was executing during the time 
period.


root@3d53dc3fdf8d:/#alias du3="du -h | sort -hr|head -30"
root@3d53dc3fdf8d:/# du3
9.0G    .
8.5G    ./data/fuseki/databases/pxmeta_hub_fed_prod
8.5G    ./data/fuseki/databases
8.5G    ./data/fuseki
8.5G    ./data



root@3d53dc3fdf8d:/# date
Tue Mar  9 06:02:46 UTC 2021
root@3d53dc3fdf8d:/#




3.5G    .
3.0G    ./data/fuseki/databases/pxmeta_hub_fed_prod
3.0G    ./data/fuseki/databases
3.0G    ./data/fuseki
3.0G    ./data
root@3d53dc3fdf8d:/# date
Tue Mar  9 05:28:09 UTC 2021
root@3d53dc3fdf8d:/#

Br, Jaana
drop all;

# 
# A sparql to combine name graphs in server  
  into one NG.
# This script should be always executed in the target dataset of the target 
jena-fuseki-server.
#
insert{
  graph ?g {
?s ?p ?o
  }
}
where{
  service  {# alkuperäinen 
RDF-dataset
{
  graph ?g { }.
}
graph ?g {
  ?s ?p ?o
}
  }
};

# Muunnos: Variable objektina
delete{
  graph ?g {
?s ?p ?o_before.
  }
}
insert{
  graph ?g {
?s ?p ?o_after.
  }
}
where{
  {
graph ?g { }.
  }
  graph ?g {
?o_before a .
?s ?p ?o_before.
bind(iri(concat("http://www./rdf/data/gsimsf/Variable/", 
strafter(str(?g), "http://www./tilasto/"), "/", strafter(str(?o_before), 
"http://www./rdf/data/gsimsf/Variable/"))) as ?o_after)
  }
};

# Muunnos: Variable 

sed: couldn't close /data/fuseki/sedh5alYI: No space left on device

2021-03-10 Thread jaanam

Hello,

I'm trying to run jena-fuseki 3.17 docker image in azure cloud. In order 
to prevent JVM from taking all memory I pulled the 
blankdots/jena-fuseki:fuseki3.17.0 from dockerHub and replaced it by a 
custom image miettinj/pxpro-jena-fuseki:fuseki3.17.0, that uses  
JVM_ARGS=-Xmx2g and pushed miettinj/pxpro-jena-fuseki:fuseki3.17.0 into 
dockerHUB.


When I try run my new image in azure cloud it fails with error: "sed: 
couldn't close /data/fuseki/sedh5alYI: No space left on device"


What can it mean and would you have any idea about the solution ?

Br, Jaana

2021-03-11T03:22:13.799Z INFO  - Pulling image from Docker hub: 
miettinj/pxpro-jena-fuseki:fuseki3.17.0


2021-03-11T03:22:14.946Z INFO  - fuseki3.17.0 Pulling from 
miettinj/pxpro-jena-fuseki
2021-03-11T03:22:14.959Z INFO  -  Digest: 
sha256:df833976b8f3c1d4392609bd7511aef1c24c3e9189f2930512440f63b29eec2b
2021-03-11T03:22:14.961Z INFO  -  Status: Image is up to date for 
miettinj/pxpro-jena-fuseki:fuseki3.17.0
2021-03-11T03:22:14.980Z INFO  - Pull Image successful, Time taken: 0 
Minutes and 1 Seconds

2021-03-11T03:22:14.987Z INFO  - Starting container for site
2021-03-11T03:22:14.989Z INFO  - docker run -d -p 5381:3030 --name 
-jena-fuseki-cont_0_e4e461e1 -e 
WEBSITES_ENABLE_APP_SERVICE_STORAGE=false -e 
WEBSITE_SITE_NAME=-jena-fuseki-cont -e 
WEBSITE_AUTH_ENABLED=False -e PORT=3030 -e WEBSITE_ROLE_INSTANCE_ID=0 -e 
WEBSITE_HOSTNAME=-jena-fuseki-cont.azurewebsites.net -e 
WEBSITE_INSTANCE_ID=4626886fa06471c0a0006f3b9b70852c5a1f20ef21a46c118d372f6d11898471 
-e HTTP_LOGGING_ENABLED=1 miettinj/pxpro-jena-fuseki:fuseki3.17.0
2021-03-11T03:22:16.211Z INFO  - Initiating warmup request to container 
-jena-fuseki-cont_0_e4e461e1 for site 
-jena-fuseki-cont


2021-03-11T03:22:16.447796968Z sed: couldn't close 
/data/fuseki/sedh5alYI: No space left on device


Re: updating persisten jena-fuseki dataset increases memory consimption in gigas

2021-03-09 Thread jaanam

Hi, Thanks for your quick anserwer and pls see my answers below!


How many triples?
And is is new data to replace the old data or in addition to the 
existing data?


476955 triplets, most parts will bu just same as the old data, just some 
triplets may change. And some new triplets may be added.



This is a TDB1 database?


jena-fuseki UI does not mention TDB1, but this is persistent and not 
TDB2.


But in our use case also memory-based datasets might work, as far as 
I've been testing in my PC they seem to work even better than persistent 
ones. What do you think ?


Br Jaana



Andy Seaborne kirjoitti 9.3.2021 19:58:

Hi Jaana,

On 09/03/2021 11:40, jaa...@kolumbus.fi wrote:

hello,

I've met the following problem with jena-fuseki (should I create bug 
ticket ?):


We need to update jena-fuseki dataset every 5 minutes by a 50 Mbytes 
ttl-file.


How many triples?
And is is new data to replace the old data or in addition to the 
existing data?


This causes the memory consumption in the machine where jena-fuseki is 
running to increase by gigas.


This was 1st detected with jena-fuseki 3.8 and later with jena-fuseki 
3.17.


To be exact I executed blankdots/jena-fuseki:fuseki3.17.0 in a docker 
container posting continously that ttl-file into the same dataset 
(pxmeta_hub_fed_prod).


This is a TDB1 database?

TDB2 is better at this - the database still grows but there is a way
to compact the database live.

JENA-1987 exposes the compaction in Fuseki.
https://jena.apache.org/documentation/tdb2/tdb2_admin.html

The database grows for two reasons: it allocates space in sparse files
in 8M chunks but the space does not count in du until actually used.
The space for deleted data is not fully recycled across transactions
because it may be in-use in a concurrent operation. (TDB1 would be
very difficult to do block ref counting; in TDB2 the solution is
compaction.)

Andy



see the output of command "du -h | sort -hr|head -30" below. attached 
the shell-script that I was executing during the time period.


root@3d53dc3fdf8d:/#alias du3="du -h | sort -hr|head -30"
root@3d53dc3fdf8d:/# du3
9.0G    .
8.5G    ./data/fuseki/databases/pxmeta_hub_fed_prod
8.5G    ./data/fuseki/databases
8.5G    ./data/fuseki
8.5G    ./data



root@3d53dc3fdf8d:/# date
Tue Mar  9 06:02:46 UTC 2021
root@3d53dc3fdf8d:/#




3.5G    .
3.0G    ./data/fuseki/databases/pxmeta_hub_fed_prod
3.0G    ./data/fuseki/databases
3.0G    ./data/fuseki
3.0G    ./data
root@3d53dc3fdf8d:/# date
Tue Mar  9 05:28:09 UTC 2021
root@3d53dc3fdf8d:/#

Br, Jaana


updating persisten jena-fuseki dataset increases memory consimption in gigas

2021-03-09 Thread jaanam

hello,

I've met the following problem with jena-fuseki (should I create bug 
ticket ?):


We need to update jena-fuseki dataset every 5 minutes by a 50 Mbytes 
ttl-file. This causes the memory consumption in the machine where 
jena-fuseki is running to increase by gigas.


This was 1st detected with jena-fuseki 3.8 and later with jena-fuseki 
3.17.


To be exact I executed blankdots/jena-fuseki:fuseki3.17.0 in a docker 
container posting continously that ttl-file into the same dataset 
(pxmeta_hub_fed_prod).


see the output of command "du -h | sort -hr|head -30" below. attached 
the shell-script that I was executing during the time period.


root@3d53dc3fdf8d:/#alias du3="du -h | sort -hr|head -30"
root@3d53dc3fdf8d:/# du3
du: cannot read directory './proc/1/map_files': Permission denied
du: cannot access './proc/96/task/96/fd/4': No such file or directory
du: cannot access './proc/96/task/96/fdinfo/4': No such file or 
directory

du: cannot access './proc/96/fd/3': No such file or directory
du: cannot access './proc/96/fdinfo/3': No such file or directory
9.0G.
8.5G./data/fuseki/databases/pxmeta_hub_fed_prod
8.5G./data/fuseki/databases
8.5G./data/fuseki
8.5G./data
379M./usr
321M./usr/local/openjdk-14
321M./usr/local
239M./usr/local/openjdk-14/lib
80M ./usr/local/openjdk-14/jmods
36M ./jena-fuseki
34M ./usr/lib
33M ./usr/local/openjdk-14/lib/server
31M ./usr/lib/x86_64-linux-gnu
15M ./usr/bin
12M ./lib/x86_64-linux-gnu
12M ./lib
7.4M./usr/lib/x86_64-linux-gnu/gconv
7.0M./usr/share
6.5M./var
5.1M./usr/lib/x86_64-linux-gnu/perl-base
5.1M./jena-fuseki/webapp
5.0M./bin
4.6M./var/lib
4.5M./var/lib/dpkg
4.2M./var/lib/dpkg/info
4.2M./sbin
3.6M./usr/sbin
3.6M./jena-fuseki/webapp/js
3.4M./usr/share/zoneinfo
root@3d53dc3fdf8d:/# date
Tue Mar  9 06:02:46 UTC 2021
root@3d53dc3fdf8d:/#

Lähettäjä: Jaana Miettinen 
Lähetetty: tiistai 9. maaliskuuta 2021 7.48
Vastaanottaja: Jaana Miettinen 
Aihe: VS: curl

7.46 käynnistetty curl.sh, joka ajaa yhdistä_NTt.sparql:ää samaan 
konttiin 3.0G:n lähtötilantessta


root@3d53dc3fdf8d:/# du3
du: cannot read directory './proc/1/map_files': Permission denied
du: cannot access './proc/92/task/92/fd/4': No such file or directory
du: cannot access './proc/92/task/92/fdinfo/4': No such file or 
directory

du: cannot access './proc/92/fd/3': No such file or directory
du: cannot access './proc/92/fdinfo/3': No such file or directory
3.5G.
3.0G./data/fuseki/databases/pxmeta_hub_fed_prod
3.0G./data/fuseki/databases
3.0G./data/fuseki
3.0G./data
379M./usr
321M./usr/local/openjdk-14
321M./usr/local
239M./usr/local/openjdk-14/lib
80M ./usr/local/openjdk-14/jmods
36M ./jena-fuseki
34M ./usr/lib
33M ./usr/local/openjdk-14/lib/server
31M ./usr/lib/x86_64-linux-gnu
15M ./usr/bin
12M ./lib/x86_64-linux-gnu
12M ./lib
7.4M./usr/lib/x86_64-linux-gnu/gconv
7.0M./usr/share
6.5M./var
5.1M./usr/lib/x86_64-linux-gnu/perl-base
5.1M./jena-fuseki/webapp
5.0M./bin
4.6M./var/lib
4.5M./var/lib/dpkg
4.2M./var/lib/dpkg/info
4.2M./sbin
3.6M./usr/sbin
3.6M./jena-fuseki/webapp/js
3.4M./usr/share/zoneinfo
root@3d53dc3fdf8d:/# date
Tue Mar  9 05:28:09 UTC 2021
root@3d53dc3fdf8d:/#

Br, Jaana
#!/bin/sh
while :
do
curl -i -H "Content-Type: application/sparql-update"  -H "Authorization: Basic pass" -X POST http://localhost:3030/pxmeta_hub_fed_prod/update --data-binary "@/c/Users/miettinj/PxProGraphQlApi/PxProGraphQlApi/tilastot_dev_bck.ttl"
#curl -i -H "Content-Type: application/sparql-update"  -X POST http://localhost:3030/pxmeta_hub_fed_prod/update --data-binary "@../migration-api/yhdista-NGt-dev.sparql" 
done