ect.
>
> What kind of tuning besides the hardware was effective for you?
>
> Does anybody have experience with partial dumps created by
> https://tools.wmflabs.org/wdumps/?
>
> Cheers
>
> Wolfgang
>
> Am 20.05.20 um 11:22 schrieb Dick Murray:
> > That's a
dy have experience with partial dumps created by
> https://tools.wmflabs.org/wdumps/?
>
> Cheers
>
> Wolfgang
>
> Am 20.05.20 um 11:22 schrieb Dick Murray:
> > That's a blast from the past!
> >
> > Not all of the details from that exchange are on the Jean list
che Jena users,
>
> Some 2 years ago Laura Morlaes and Dick Murray had an exchange on this
> list on how to influence the performance of
> tdbloader. The issue is currently of interest for me again in the context
> of trying to load some 15 billion triples from a
> copy o
Hi.
Is it possible to natively detect whether a write has occurred to a
DatasetGraph since a particular epoch?
For the purposes of caching if I perform an expensive read from a
DatasetGraph knowing whether I need to invalidate the cache is very useful.
Does TDB or the Mem natively track if a
Be very careful using vmtouch especially if you call -dl as you could very
easily and quickly kill a system. I've used this tool on cloud VM's to
mitigate cycle times, think DBAN due to public nature of hardware. It's a
fast way to an irked OS thrashing around.
Dick
On Sun, 16 Dec 2018 19:57
ems that distribute SPARQL using Jena.
>
> Dick Murray has written a system called Mosaic that (I believe) uses
> Apache Thrift to distribute the lower-level (DatasetGraph) primitives that
> ARQ uses to execute SPARQL. An advantage over your plan might be that he
> isn't serializing
Slow needs to be qualified. Slow because you need to load 1MT in 10s? What
hardware? What environment? Are you loading a line based serialization? Are
you loading from scratch or appending?
D
On Mon, 19 Mar 2018, 10:51 Davide, wrote:
> Hi,
> What is the best way to perform
>From an enterprise perspective http is well supported with years of
development in associated stacks, such as load balancing etc. It also
allows Devs to use different languages. That said we also employ Thrift
based DGs which allow direct access from Python etc. It doesn't remove the
overhead, it
On Mon, 12 Mar 2018, 09:27 Davide Curcio, wrote:
> Hi,
> I want to store a large amount of data inside the TDB server with the
>
Quantity or size on disk?
Jena API. In my code, I retrieve data for each iteration, and so I need
> to store these data in TDB, but if I
recreate the
node.
On 19 Jan 2018 13:56, "Andy Seaborne" <a...@apache.org> wrote:
On 18/01/18 16:48, Dick Murray wrote:
> Is it possible to get a Pair<String, String> lexvo (left) code/002 (right)
> from abbrev given the prefix map entry;
>
In Turt
Is it possible to get a Pair lexvo (left) code/002 (right)
from abbrev given the prefix map entry;
lexvo http://lexvo.org/id/
and the URI;
http://lexvo.org/id/code/002
PrefixMapStd (actually base call) returns null because the call to;
protected Pair
That's one graph in many pieces and the owner of the graph should clearly
state what is what!
On 26 Dec 2017 20:28, "Laura Morales" wrote:
> Blank node identifiers are only limited in scope to a serialization of a
> particular RDF graph, i.e. the node _:b does not represent
On 26 Dec 2017 19:10, "Laura Morales" wrote:
> What is more, it gets bNode labels across files right (so using _:a in
> two files is two bNodes).
Thinking about this...
- if the files contain anonymous blank nodes (for example in Turtle), each
node (converted with RIOT)
That seems slow for the size.
We bulk load triples into Windows and get similar times to Centos/Fedora on
the same hardware.
You can hack the tdbloader2 to run on Windows as basically you're
exploiting the OS sort which on Windows is;
*sort* [*/r*] [*/+**n*] [*/m* *kilobytes*] [*/l* *locale*]
How big? How many?
On 22 Dec 2017 8:37 pm, "Dimov, Stefan" wrote:
> Hi all,
>
> We have a project, which we’re trying to productize and we’re facing
> certain operational issues with big size files. Especially with copying and
> maintaining them on the productive cloud
On 18 December 2017 at 08:07, Laura Morales wrote:
> > The don't have index permutations spo, ops, pos, etc.
>
> Yes they have, what you're saying is wrong. See http://www.rdfhdt.org/hdt-
> binary-format/#triples That's what the .hdt.index file is about, to store
> more index
n but eventually I'll saturate it.
Sent: Tuesday, December 12, 2017 at 9:20 PM
From: "Dick Murray" <dandh...@gmail.com>
To: users@jena.apache.org
Subject: Re: Report on loading wikidata
tdbloader2
For anyone still following this thread ;-)
latest-truthy supposedly
Correct, Mosaic federates multiple datasets as one. At some point in a
query find [G]SPO will get called and Mosaic will concurrently call find on
each child dataset and return the set of results. The dataset can be memory
or TDB or Thrift (this one's another discussion) Mosaic doesn't care as
We "hand" a transaction around using a ThreadProxy, which is basically a
wrapper around an ExecutorService which does one thing at a time. You
create it then give it to one or more threads which submit things to do and
it returns Future's. We extend it to implement Transactional so it works
with
Sent: Monday, December 11, 2017 at 11:31 AM
From: "Dick Murray" <dandh...@gmail.com>
To: users@jena.apache.org
Subject: Re: Report on loading wikidata
Inline...
On 10 December 2017 at 23:03, Laura Morales <laure...@mail.com> wrote:
> Thank you a lot Dick! Is this
Understand, I'm running sort and uniq on truthy out of interest...
On 12 December 2017 at 10:31, Andy Seaborne <a...@apache.org> wrote:
>
>
> On 12/12/17 10:06, Dick Murray wrote:
> ...
>
>> As an aside there are duplicate entries in the data-triples.tmp file, is
>
ms with tdbloader2 with complex --sort-args (it
> only handles one single arg/value correctly). My main trick was to put in
> a script for "sort" that had the required settings built-in. I wanted to
> set --compress, -T and the buffer size.
>
> On 10/12/17 21:18, Dick Mu
Inline...
On 10 December 2017 at 23:03, Laura Morales wrote:
> Thank you a lot Dick! Is this test for tdbloader, tdbloader2, or
> tdb2.tdbloader?
>
> > 32GB DDR4 quad channel
>
> 2133 or higher?
>
2133
> > 3 x M.2 Samsung 960 EVO
>
> Are these PCI-e disks? Or SATA? Also,
Ryzen 1920X 3.5GHz, 32GB DDR4 quad channel, 3 x M.2 Samsung 960 EVO,
172K/sec 3h45m for truthy.
Is it possible to split the index files into separate folders?
Or sym link the files, if I run the data phase, sym link, then run the
index phase?
Point me in the right direction and I'll extend the
nd tdbloader2data.
>
> ajs6f
>
> > On Dec 6, 2017, at 2:50 PM, Dick Murray <dandh...@gmail.com> wrote:
> >
> > TDB Loader 2, where does it call the Unix sort please? I'm obviously
> > looking too hard!
> >
> > TDB2 Loader does a simple .add(Quad)? I'm not missing something?
> >
> > Dick.
>
>
TDB Loader 2, where does it call the Unix sort please? I'm obviously
looking too hard!
TDB2 Loader does a simple .add(Quad)? I'm not missing something?
Dick.
Hello.
On 2 Dec 2017 8:55 pm, "Andy Seaborne" wrote:
Short story I used the following "reasonable" device
>
> Dell M3800
> Fedora 27
> 16GB SODIMM DDR3 Synchronous 1600 MHz
> CPU cache L1/256KB,L2/1MB,L3/6MB
> Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz
Hi.
Sorry for the delay :-)
Short story I used the following "reasonable" device
Dell M3800
Fedora 27
16GB SODIMM DDR3 Synchronous 1600 MHz
CPU cache L1/256KB,L2/1MB,L3/6MB
Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz 4 cores 8 threads
to load part of the latest-truthy.nt from
LOL, there's lots of things where I'd like to "move the problem elsewhere".
I've achieved concurrent 120K on the server hardware but it depends on the
input. There's another recent Jena thread regarding sizing and that's tied
up with what's in the input. I see the same thing with loading data,
This is probably me but...
I've got a collection of import errors in my Jena 3.2.0-rc1 fork, the
common issue being the import prefix "org.apache.jena.ext"...
i.e. import org.apache.jena.ext.com.google.common.cache.Cache ; in jena-arq
FactoryRDFCaching
I've checked the github apache jena
I've seen this type of statement in regard to Oracle whereby a materialized
query is disk based and updated periodically based on the query. It's
useful in BI where you don't require the latest data. As to RDF the closest
I can parallel is persisting inference (think RDFS subclass of i.e. A -> B
It is for this reason that I use and as a nod to my Cisco
engineer days and example.org... :-)
As Martynas Jusevičius said give it a little thought.
On 12 April 2017 at 17:37, Martynas Jusevičius
wrote:
> It would not be an error as long it is a valid URI.
>
>
I use "urn:ex:..." in a lot of my test code (short for "urn:example:").
Then the predicate is "urn:ex:time/now" or "urn:ex:time/duration" or
whatever you need...
On 12 April 2017 at 09:49, Laura Morales wrote:
> > The question is a bit unclear. If there is no existing
I think that worked;
wants to merge 1 commit into apache:master from dick-twocows:master
if so I'll compile the Thrift file and commit that too...
On 5 April 2017 at 19:28, Andy Seaborne <a...@apache.org> wrote:
> Should be - let's try it!
>
> Andy
>
>
> On 05/
the appropriate handler to
transform and load.
On 5 April 2017 at 12:54, Andy Seaborne <a...@apache.org> wrote:
>
>
> On 04/04/17 20:26, Dick Murray wrote:
>
>> I'd be happy to supply the current code we have, just need to get the
>> current project delivered (classic
..@apache.org> wrote:
On 04/04/17 19:02, Dick Murray wrote:
> Slightly lateral on the topic but we use a Thrift endpoint compiled against
> Jena to allow multiple languages to use Jena. Think interface supporting
> sparql, sparul and bulk load...
>
I'd like to put in binary vers
Slightly lateral on the topic but we use a Thrift endpoint compiled against
Jena to allow multiple languages to use Jena. Think interface supporting
sparql, sparul and bulk load...
On 3 Apr 2017 6:36 pm, "Martynas Jusevičius" wrote:
> By using uniform protocols such as
On 26 Mar 2017 5:20 pm, "Laura Morales" wrote:
- Is Jena a "native" store? Or does it use some other RDBMS/NoSQL backends?
It has memory, TDB and SDB (I'm not sure of the current state)
- Has anybody ever done tests/benchmarks to see how well Jena scales with
large datasets
rpose. I'm not actually sure we have a good
> non-blocking method for your use right now. We have inTransaction(), but
> that's not too helpful here.
>
> But someone else can hopefully point to a technique that I am missing.
>
>
> ---
> A. Soroka
> The University of Virginia
Hi.
Is there a way to get what Transactional a DatasetGraph is using and
specifically what Lock semantics are in force?
As part of a distributed DatasetGraph implementation I have a
DatasetGraphTry wrapper which adds Boolean tryBegin(ReadWrite) and as the
name suggests it will try to lock the
Hi.
Question regarding the design thoughts behind Context and the callbacks.
Also merging BNodes...
I have implemented a Thrift based RPC DatasetGraph consisting of a Client
(implements DatasetGraph) which forwards calls to an IFace (generated from
a Thrift file which closely mimics the
much less of an issue to
> find Linux testers. Windows seems to be generally the hardest platform to
> get results for. I certainly didn't intend any more than that, but I copied
> that list from earlier release vote announcements. (!)
>
> But maybe I am missing some history?
>
> a
Hi.
Under checking Windows and Mac OS's are listed but not Linux. Is Jena
assumed to pass? I'mean running Jena 3.2 snapshot on Ubuntu 16.04 and
Centos 7.
If you haven't broken anything in the snapshot then I vote release. ;-)
On 1 Feb 2017 16:09, "A. Soroka" wrote:
>
by accident because it has a field.
>
> Initialization is in org.apache.jena.system.SerializerRDF, which is
> called from InitRIOT which is called by by system initialization based on
> ServiceLoader.
>
> Andy
>
>
> On 20/01/17 18:35, Dick Murray wrote:
>
>> Whilst this
Whilst this issue is reported and possibly caused by Kryo I think it's my
understanding of how Jena is or is not serializing...
I'm using Jena 3.2.0-SNAPSHOT and Kryo(Net) to serialize Jena nodes but
Kryo baulks when asked to handle a (the) Node_ANY;
Exception in thread "Server"
Sorry, that should have been "not" asked on the Jena user group...
On 18 Jan 2017 7:09 pm, "Dick Murray" <dandh...@gmail.com> wrote:
You need to learn the difference between == and .equals().
Please read up on basic Java skills! These questions should be as
You need to learn the difference between == and .equals().
Please read up on basic Java skills! These questions should be asked on the
Jena user group...
On 18 Jan 2017 1:14 pm, "Sidra shah" wrote:
Hello Lorenz, its not giving me the exception now but it does not
Google for example RDF datasets in a serialisation supported by Jena.
A web search really is your best friend for this...
On 15 Jan 2017 3:13 pm, "kumar rohit" wrote:
I want to know, like DBpdia, what are other sources where we can get data
from and supported also by
An example rule which you can test and then expand on is;
[Manager: (?E rdf:type NS:Employee), (?E NS:netSalary ?S), greaterThan (?S,
5000) -> (?X rdf:type NS:Manager)]
Also see https://jena.apache.org/documentation/inference/
On 12 Jan 2017 19:15, "tina sani" wrote:
x10 compression so applied
> to RDD data I'd expect that or more.
>
> There are line based output formats (I don't know if they work with
> Elephas - no reason why not in principle).
>
> http://jena.apache.org/documentation/io/rdf-output.html#
> line-printed-formats
>
> Se
a
The University of Virginia Library
> On Dec 21, 2016, at 2:17 PM, Dick Murray <dandh...@gmail.com> wrote:
>
> Hi, on a similar vein I have a modified NTriple reader which uses a prefix
> file to reduce the file size. Whilst the serialisation allows parallel
> processi
Hi, on a similar vein I have a modified NTriple reader which uses a prefix
file to reduce the file size. Whilst the serialisation allows parallel
processing in spark the file sizes were large and this has reduced them to
1/10 the original size on average.
There is not an existing line based
Excellent, I was currently wrapping and unwrapping as Strings which fixed
another issue along with prefixing bnodes to remove clashes between TDB's.
I'll pull and refactoring my code...
On 17 Dec 2016 20:03, "Andy Seaborne" wrote:
Related:
Jena now provides "Serializable" for
Just posted a question regarding Spark because I'm heading down the
streaming route as we're aggregating multiple large datasets together and
our 1.5TB TDB was causing us some issues. We have many large graph writes
of between 1-4Mb triples which I currently write to a number of TDB's and
use a
s available in sub shells.
>
> export JAVA_HOME= /usr/lib/jvm/jre-1.8.0-openjdk.x86_64/bin/java
>
> I have read an issue (JENA-1035), that fuseki-server script ignores
> JAVA_HOME variable while it executes the "java" command.
> Has it been fixed?
>
Unknown, as i d
t JAVA_HOME=c:\jre8
> Thanks and best regards,
> Sandor
>
>
> Am 19.10.2016 um 15:36 schrieb Dick Murray:
>
>> Hi.
>>
>> Check what version of JRE you have with java -version
>>
>> dick@Dick-M3800:~$ java -version
>> java version "1.8.0_1
Hi.
Check what version of JRE you have with java -version
dick@Dick-M3800:~$ java -version
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
Your exception should say what version it is having trouble
le}/log4j.properties
Is it worth noting this behaviour somewhere?
Thanks for the point in the right direction.
Dick.
On 17 October 2016 at 21:33, Andy Seaborne <a...@apache.org> wrote:
>
>
> On 17/10/16 21:13, Dick Murray wrote:
>
>> Hi.
>>
>> On 17 Oct 2016 18:16, &qu
On 17 Oct 2016 21:33, "Andy Seaborne" <a...@apache.org> wrote:
>
>
>
> On 17/10/16 21:13, Dick Murray wrote:
>>
>> Hi.
>>
>> On 17 Oct 2016 18:16, "Andy Seaborne" <a...@apache.org> wrote:
>>>
>>>
>>> Are
On 17 Oct 2016 21:33, "Andy Seaborne" <a...@apache.org> wrote:
>
>
>
> On 17/10/16 21:13, Dick Murray wrote:
>>
>> Hi.
>>
>> On 17 Oct 2016 18:16, "Andy Seaborne" <a...@apache.org> wrote:
>>>
>>>
>>> Are
Hi.
I'm getting odd behaviour in Jena when I execute the same query
concurrently.
The query has an optional which is unmatched but which appears to cause a
java.lang.String exception from the atlas code.
This only happens if multiple queries are submitted concurrently and
closely. On a "fast"
-08-08T10:05:04.216Z]/[PT1M50.931S]] Value
[org.iungo.result.Result]
*Dick Murray*
Technology Specialist
*Business Collaborator Limited*
9th Floor, Reading Bridge House, George Street, Reading, RG1 8LS, United
Kingdom
T 0044 7884 111729 *|* E dick.mur...@groupbc.com
<alistair.wa...@g
:10, "Andy Seaborne" <a...@apache.org> wrote:
>
> On 27/07/16 13:19, Dick Murray wrote:
>>
>> ;-) Yes I did. But then I switched to the actual files I need to import
and
>> they produce ~3.5M triples...
>>
>> Using normal Jena 3.1 (i.e. no speci
...
Just before hitting send I'm at pass 13 and the [B maxed at just over 4Gb
before dropping back to 2Gb.
Dick.
On 27 July 2016 at 11:47, Andy Seaborne <a...@apache.org> wrote:
> On 27/07/16 11:22, Dick Murray wrote:
>
>> Hello.
>>
>> Something doesn't add up h
[I
8: 310 27899112 [Ljava.util.HashMap$Node;
9:935412 22449888 java.lang.Long
10:328196 18378976 java.nio.ByteBufferAsIntBufferB
2016-07-27T09:52:49.082Z begin WRITE
Exception: java.lang.OutOfMemoryError thrown from the
UncaughtExceptionHandler in thread "ma
t gave up on F8 after counting to 500
>>>
>>> Dick.
>>>
>>
>> Make sure you have all the dependencies successfully resolved with
>> mvn -o dependency:tree.
>>
>> The Apache snapshot repo was having a bad day earlier and Multiset is
>> from
rted to RDF, and loaded with tdbloader?
>
> If TDB is using DiretcByteBuffersm have you set
> "transactionJournalWriteBlockMode" to "direct"?
>
> You need to increase the direct memory space, not the heap.
>
> Andy
>
>
> On 26/07/16 10:14, Dick Murray
Hi.
I've got a repeatable problem with Jena 3.1 when performing a bulk load.
The bulk load converts a file into ~200k quads and adds them to a TDB
instance within a normal begin write, add quads and commit. Initially this
completes in 30-40 seconds, However if I repeat the process (with the same
Hi.
I've pushed up a draft to https://github.com/dick-twocows/jena-dev.git.
This has two test cases;
Echo : which will echo back the find GSPO call i.e. call find ABCD and you
will get the Quad ABCD back. This does not cache between calls.
CSV : which will transform a CSV file into Quads i.e.
gt; ... but it seems that its returning as a factor
>
> inline lambdas are apparently faster than the same code with a class
> implementation - the compiler emits an invokedynamic for the lanmbda
>
> and Java Stream can cause a lot of short-lived objects.
>
> Andy
>
Eureka moment! It returns a new Graph of a certain type. Whereas I need the
graph node to determine where the underlying data is.
Cheers Dick.
On 15 March 2016 at 11:28, Andy Seaborne <a...@apache.org> wrote:
> On 15/03/16 10:30, Dick Murray wrote:
>
>> Sorry, supportsTransac
age
>> From: Andy Seaborne <a...@apache.org>
>> Date: 13/03/2016 7:54 pm (GMT+00:00)
>> To: users@jena.apache.org
>> Subject: Re: SPI DatasetGraph creating Triples/Quads on demand using
>>DatasetGraphInMemory
>>
>> On 10/03/16 20:10, Dic
s, Node p, Node o)
>{ return find(s,p,o).findAny().isPresent() ; }
>default boolean contains(Node g, Node s, Node p, Node o)
>{ return find(g,s,p,o).findAny().isPresent() ; }
>
> // Prefixes ??
> }
>
>
> https://github.com/afs/AFS-Dev/tree/master/src/main/java/projects/dsg
ropriately-ordered index based on the fixed and variable
> slots in the find pattern and using the concrete methods above to stream
> tuples back.
> >>>>
> >>>> As to why you are seeing your methods called in some places and not
> in others, DatasetGraphBaseFi
Hi I'm trying to get the 3.0.1 to build using eclipse/maven but it refuses
to "find" it in the central repository.
I ran mvn dependency:get
-Dartifact=org.apache.jena:apache-jena-libs:jar:3.0.1 and got the
following...
Am I missing something..?
[INFO] Resolving
I might be confusing the DynamicDataset...
Dick
On 20 October 2014 20:40, Dick Murray dandh...@gmail.com wrote:
Thanks that confirms what I thought.
Crazy idea time!
Am I correct in thinking that there is a dataset view which allows you
to present multiple datasets as one? I'm sure I saw
Hello all.
Are there any pointers to inserting large volumes of data in a persistent
RW TDB store please?
I currently have a 8M line 500MB+ input file which is being parsed by
JavaCC and the created quads inserted into a TDB store.
The process genreates 120M quads and takes just over 2hrs which
estimated
that it will grow by the same amount every working day which equates to
31,200M or 31B triples and 13,780GB or 14TB on disk in a year...
Dick
On 20 Oct 2014 17:56, Andy Seaborne a...@apache.org wrote:
On 20/10/14 10:12, Dick Murray wrote:
Hello all.
Are there any pointers to inserting
, Andy Seaborne a...@apache.org wrote:
On 18/06/13 18:22, Dick Murray wrote:
I'm looking for dynamic inference based on the select. Given a dataset
with
multiple named graphs I would like the ability to wrap specific named
graphs based on some form of filter when the select is processed
together would be good. It
seems to be querying the dataset without the inference graph. I don't see
where you query the dataset (and which one)
if (graphNode.getURI().equals(**types.getURI())) {
if (graphNode.equals(types.**asNode()) {
On 18/06/13 14:22, Dick Murray wrote:
Hi
. update before adding.
Andy
A TDB dataset is a triples+quad store. You can't add an in-memory
storage-backed graph to a dataset backed by TDB.
If you want a mixe dataset, you can create an in-memory dataset and add
in TDB backed graphs.
Andy
On 28/11/12 20:32, Dick Murray
);
}
}
Hope this helps,
Rob
On 10/29/12 6:43 AM, Dick Murray dandh...@gmail.com wrote:
Hi all
I need to permit/deny certain SPARUL update operations e.g. deny create|
drop graph.
I've looked at the UpdateEngineMain and UpdateVisitor classes and was
wondering if anyone has extended
82 matches
Mail list logo