Re: Restart during Fuseki compaction

2024-02-06 Thread Andy Seaborne

Hi Samuel,

This is when the server exists for some reason?

(If it's an internal exception, there should be a stack trace in the log 
file.)


What operating system are you running on?

What's in the new Data-0002 directory?

It does look like some defensive measures are needed to not choose to 
use the incomplete storage directory.


Andy


On 06/02/2024 09:26, Samuel Börlin wrote:

Hi everybody,

I recently noticed that when Fuseki (4.10.0) is stopped during a compaction 
task (started via the HTTP endpoint `/$/compact/{name}?deleteOld=true`)
then it uses the new and still incomplete database (e.g. Data-0002 instead of 
the original non-compacted Data-0001) when it is started again.
Is there a way to do compaction in an atomic manner so that this doesn't happen?

As a workaround I'm currently thinking about simply deleting (or perhaps 
renaming/moving) all Data- directories but the one with the lowest index 
when the database is started.
I always use `?deleteOld=true`, so I only ever expect there to be one Data- 
directory when it starts. If there are multiple directories then that means 
that there must have been an incomplete compaction.
Does this seem like a reasonable approach?

Thanks and best regards,
Samuel


Re: question about FROM keyword

2024-02-06 Thread Zlatareva, Neli (Computer Science)
Thank you so much, Andy. I'll try the suggested workarounds.
Really appreciate the help.
Regards, Neli.

Neli P. Zlatareva, PhD
Professor of Computer Science
Department of Computer Science
Central Connecticut State University
New Britain, CT 06050
Phone: (860) 832-2723
Fax: (860) 832-2712
Web site: cs.ccsu.edu/~neli/

From: Andy Seaborne 
Sent: Monday, February 5, 2024 6:03 PM
To: users@jena.apache.org 
Subject: Re: question about FROM keyword

EXTERNAL EMAIL: This email originated from outside of the organization. Do not 
click any links or open any attachments unless you trust the sender and know 
the content is safe.

This is a combination of things happening.

In the one case of no data (grph or dataset) provided, Jena does read
the URL. If there is supplied data, FROM refers the dataset.

The URL is coming back from 
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.learningsparql.com%2F&data=05%7C02%7CZlatareva%40ccsu.edu%7C0765777b35b44fd5cd4908dc269ec176%7C2329c570b5804223803b427d800e81b6%7C0%7C0%7C638427710501957862%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=68H6CWA%2FoV1wph2GZhSO1ariOvPuzfJ7iwPYoT75Oig%3D&reserved=0
as explicitly "Content-Type: text/plain", not "text/turtle".

Jena pretty much ignores "text/plain" because it is usually wrong, so it
tries to guess the syntax.

The URL in the message

   (URI=file:///D:/neli/cs575Spring24/ex070mod2.rq : stream=text/plain)

is misleading - that "URI" is the base URI, not the URI being read.

 > (This specifically may be a bug in the arq tool)

Yes, it is.

Recorded as 
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fjena%2Fissues%2F2250&data=05%7C02%7CZlatareva%40ccsu.edu%7C0765777b35b44fd5cd4908dc269ec176%7C2329c570b5804223803b427d800e81b6%7C0%7C0%7C638427710501967341%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=qEJGmrCQ7v3co4YG70aGA8s7AT%2BHXhw3%2BRVEexGf0UU%3D&reserved=0

Corrected, the results are:

-
| last   | first | courseName   |
=
| "Mutt" | "Richard" | "Updating Data with SPARQL"  |
| "Mutt" | "Richard" | "Using SPARQL with non-RDF Data" |
| "Marshall" | "Cindy"   | "Modeling Data with OWL" |
| "Marshall" | "Cindy"   | "Using SPARQL with non-RDF Data" |
| "Ellis"| "Craig"   | "Using SPARQL with non-RDF Data" |
-

Workarounds:
1/ Download the file using curl or wget as suggested
2/ Set the base on the command line with
--base 
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.learningsparql.com%2F2ndeditionexamples%2Fex069.ttl&data=05%7C02%7CZlatareva%40ccsu.edu%7C0765777b35b44fd5cd4908dc269ec176%7C2329c570b5804223803b427d800e81b6%7C0%7C0%7C638427710501970805%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=VBCcgtqw2r%2BSlqBoOqQzYg3O4WyYl7seLQnMsjSYHVY%3D&reserved=0


The message

ERROR StatusLogger Reconfiguration failed: No configuration found for
'73d16e93' at 'null' in 'null'

is unrelated.

It is the command not finding the logging set up - I don't know why that
is happening.

Try copying the log4j2.properties from the distribution directory into
the current directory.

Andy

On 05/02/2024 13:06, Zlatareva, Neli (Computer Science) wrote:
> Hi Rob, thank you so much for the quick response. What made me wonder was 
> that this same FROM from arq on command line worked perfectly fine in the 
> past (was able to access remote files). However, I assume that for different 
> reasons (security?) this is not the case anymore.
> Truly appreciate the help.
> Thanks.
> Regards, Neli.
>
> Neli P. Zlatareva, PhD
> Professor of Computer Science
> Department of Computer Science
> Central Connecticut State University
> New Britain, CT 06050
> Phone: (860) 832-2723
> Fax: (860) 832-2712
> Web site: cs.ccsu.edu/~neli/
> 
> From: Rob @ DNR 
> Sent: Monday, February 5, 2024 6:32 AM
> To: users@jena.apache.org 
> Subject: Re: question about FROM keyword
>
> EXTERNAL EMAIL: This email originated from outside of the organization. Do 
> not click any links or open any attachments unless you trust the sender and 
> know the content is safe.
>
> So, there’s a couple of things happening here.
>
> Firstly, Jena’s SPARQL engine always treats FROM (and FROM NAMED) as 
> referring to graphs in the local dataset.  So, it doesn’t matter that the URL 
> in your FROM is a valid RDF resource on the web, Jena won’t try and load that 
> by default, it just looks for a gr

Restart during Fuseki compaction

2024-02-06 Thread Samuel Börlin
Hi everybody,

I recently noticed that when Fuseki (4.10.0) is stopped during a compaction 
task (started via the HTTP endpoint `/$/compact/{name}?deleteOld=true`)
then it uses the new and still incomplete database (e.g. Data-0002 instead of 
the original non-compacted Data-0001) when it is started again.
Is there a way to do compaction in an atomic manner so that this doesn't happen?

As a workaround I'm currently thinking about simply deleting (or perhaps 
renaming/moving) all Data- directories but the one with the lowest index 
when the database is started.
I always use `?deleteOld=true`, so I only ever expect there to be one Data- 
directory when it starts. If there are multiple directories then that means 
that there must have been an incomplete compaction.
Does this seem like a reasonable approach?

Thanks and best regards,
Samuel