Re: TDB optimization query

Amandeep Srivastava Wed, 27 Nov 2019 21:45:32 -0800

Thanks Andy, setting it that way worked.

Also, can we turn on the verbose logging in TDB2.loader like we have in
tdbloader2?


Basically, giving an output of how many triples it's loading and how much
time has elapsed so far.

On Thu, 14 Nov, 2019, 2:20 PM Andy Seaborne, <[email protected]> wrote:

> Firstly - just to be clear - tdb.tdbloader2 is (confusingly) for TDB1.
> Old name, before TDB2 came along so we're a bit stuck with it.
>
> tdbloader2 respects the $TMPDIR environment variable.
>
> Or set the SORT_ARGS environment variable with --temporary-directory=
> (or -T). See tdbloader2 --help
>
>      Andy
>
> On 14/11/2019 02:54, Amandeep Srivastava wrote:
> > I was trying to test the performance of tdb.tdbloader2 by creating a TDB
> > database. The loader failed at sort SPO step. The failure seems to occur
> > because of insufficient storage in the /tmp folder. Can we point tdb to
> use
> > another folder as /tmp?
> >
> > Error log:
> > sort: write failed: /tmp/sortxRql3B: No space left on device
> >
> > On Wed, 13 Nov, 2019, 5:37 PM Amandeep Srivastava, <
> > [email protected]> wrote:
> >
> >> Thanks, Andy, for the detailed explanation :)
> >>
> >> On Wed, 13 Nov, 2019, 4:52 PM Andy Seaborne, <[email protected]> wrote:
> >>
> >>>
> >>>
> >>> On 12/11/2019 15:53, Amandeep Srivastava wrote:
> >>>> Thanks for the heads up, Dan. Will go and check the archives.
> >>>>
> >>>> I think I should get how to decide between tdb and TDB2 in the
> archives
> >>>> itself.
> >>>
> >>> For large bulk loaders, the TDB2 loader is faster, if you use
> >>> --loader-parallel (NB it can take over your machine's I/O!)
> >>>
> >>> See tdb2.tdbloader --help for names of plans that are built-in.
> >>>
> >>> The only way to know which is best is to try but
> >>>
> >>>
> >>> The order threading used is:
> >>>
> >>> sequential < light < phased < parallel
> >>>
> >>> (it does not always mean more threads is faster).
> >>>
> >>> sequential is roughly the same as the TDB1 bulk loader.
> >>>
> >>> parallel usualy wins as data gets larger (several 100m) if the machine
> >>> has the I/O to handle it.
> >>>
> >>>       Andy
> >>>
> >>>>
> >>>> On Tue, 12 Nov, 2019, 8:59 PM Dan Pritts, <[email protected]> wrote:
> >>>>
> >>>>> Look through the list archives for posts from Andy describing the
> >>>>> differences between tdb1 and tdb2. they have different
> optimizations; I
> >>>>> don't recall the differences.
> >>>>>
> >>>>> thanks
> >>>>> danno
> >>>>>
> >>>>> Dan Pritts
> >>>>> ICPSR Computing and Network Services
> >>>>>
> >>>>> On 12 Nov 2019, at 7:29, Amandeep Srivastava wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> I'm trying to create a TDB database from Wikidata's official RDF
> dump
> >>>>>> to
> >>>>>> read the data using Fuseki service. I need to make a few queries for
> >>>>>> my
> >>>>>> personal project, running which the online service times out.
> >>>>>>
> >>>>>> I have a 12 core machine with 36 GB memory.
> >>>>>>
> >>>>>> Can you please advise on the best way for creating the database?
> Since
> >>>>>> the
> >>>>>> dump is huge, I cannot try all the approaches. Besides, I'm not sure
> >>>>>> if the
> >>>>>> tdbloader function works in a similar way on data of different
> sizes.
> >>>>>>
> >>>>>> Questions:
> >>>>>>
> >>>>>> 1. Which one would be better to use - tdb.tdbloader2 (TDB1) or
> >>>>>> tdb2.tdbloader (TDB2) for creating the database and why? Any
> specific
> >>>>>> configurations that I should be aware of?
> >>>>>>
> >>>>>> 2. I'm running a job currently using tdb.tdbloader2 but it is using
> >>>>>> just a
> >>>>>> single core. Also, it's loading speed is decreasing slowly. It
> started
> >>>>>> at
> >>>>>> an avg of 120k tuples and is currently at 80k tuples. Can you advise
> >>>>>> how
> >>>>>> can I utilize all the cores of my machine and maintain the loading
> >>>>>> speed at
> >>>>>> the same time?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Aman
> >>>>>
> >>>>
> >>>
> >>
> >
>

Re: TDB optimization query

Reply via email to