date:20180914

Re: How to send custom header per request

2018-09-14 Thread Niraj Patel

Thanks for the replies!

> For the future, Andy's suggestion of a "hook" in HTTP execution for 
> manipulating requests sounds like a good one. Would that meet your needs,
> and if so, will you please file a ticket for it?
Yes, that would be great! Y’all already have a hook for responses so one for 
requests would fit that pattern. I will file a ticket.

Niraj Patel
On Sep 14, 2018, 8:16 AM -0500, users@jena.apache.org, wrote:
>
> For the future, Andy's suggestion of a "hook" in HTTP execution for 
> manipulating requests sounds like a good one. Would that meet your needs,

Re: android Jena

2018-09-14 Thread elio hbeich

>
>
> Hello,
>
> I have been generated rdf/xml file  from Pretege.
> I am trying to load it in android studio by using the following code:
>
> Model m = 
> FileManager.get().loadModel("C:\\Users\\Toshiba\\AndroidStudioProjects\\MyApplication\\app\\sampledata\\MallOntology.owl")
>  ;
>
> but I always get that the File Not exist.
>
> I am trying to insert data fetched from Facebook login the source code is :
>
> Model m = 
> FileManager.get().loadModel("C:\\Users\\Toshiba\\AndroidStudioProjects\\MyApplication\\app\\sampledata\\MallOntology.owl")
>  ;
>
>
> String updateString = "PREFIX ns: 
> \n"
>  +
> "PREFIX rdf: \n" +
> "PREFIX owl: \n" +
> "PREFIX xml: \n" +
> "PREFIX xsd: \n" +
> "PREFIX rdfs: \n" +
> "PREFIX ts: \n" +
> "PREFIX ds: 
> \n"
>  +
> "\n" +
> "Insert data 
> { + ID + "> rdf:type ds:User.\n" +
> " + ID + "> rdf:type ds:Personal.\n" +
> " + ID + "> rdf:type ts:NamedIndividual.\n" +
> " + ID + "> ns:First_Name \"" + FN + "\".  \r\n" +
> " + ID + "> ns:Last_Name \"" + LN  + "\".  \r\n" +
> //  
> "
>  ns:Gender \""+gender.getText().toString()+"\".  \r\n" +
> " + ID + "> ns:Username \"" + um + "\".  \r\n" +
> " + ID + "> ns:Email \"" + em + "\".  \r\n" +
> //
> "
>  ns:Date_Of_Birth \""+birthday+"\".  \r\n" +
> " }";
>
>  UpdateAction.parseExecute(updateString, m);
>
> try {
> m.write(new 
> FileOutputStream("C:\\Users\\Toshiba\\AndroidStudioProjects\\MyApplication\\app\\sampledata\\MallOntology.owl"),
>  "RDF/XML");
> } catch (FileNotFoundException e) {
> e.printStackTrace();
> }
>
> Do u have any idea how can i solve the problem?
>
> the Android Manifest code is:
>
> 
> http://schemas.android.com/apk/res/android;
> package="com.example.toshiba.myapplication">
>
> 
>  android:name="android.permission.WRITE_EXTERNAL_STORAGE"/>
> 
> 
>  />
> 
> 
> 
> 
>
>  android:allowBackup="true"
> android:icon="@mipmap/ic_launcher"
> android:label="@string/app_name"
> android:roundIcon="@mipmap/ic_launcher_round"
> android:supportsRtl="true"
> android:theme="@style/AppTheme">
>
>
>
>  android:name=".MainActivity">
> 
> 
>
> 
> 
> 
>
>  android:name="com.facebook.sdk.ApplicationId"
> android:value="@string/facebook_app_id" />
>  android:name="com.google.android.gms.version"
> android:value="@integer/google_play_services_version" />
>
>
>  android:name="com.facebook.FacebookActivity"
> 
> android:configChanges="keyboard|keyboardHidden|screenLayout|screenSize|orientation"
> android:label="@string/app_name" />
>  android:name="com.facebook.CustomTabActivity"
> android:exported="true">
> 
> 
>
> 
> 
>
> 
> 
> 
>  android:name=".Main2Activity"
> android:label="@string/title_activity_main2"
> android:theme="@style/AppTheme.NoActionBar" />
> 
>  android:name=".EventActivity"
> android:label="@string/title_activity_event"
> android:theme="@style/AppTheme.NoActionBar" />
>  android:name=".PreferenceActivity"
> android:label="@string/title_activity_preference" />
> 
>  android:name=".LFActivity"
> android:label="@string/title_activity_lf"
> android:theme="@style/AppTheme.NoActionBar" />
> 
> 
>  android:name=".Main5Activity"
> android:label="@string/title_activity_main5"
> android:theme="@style/AppTheme.NoActionBar">
> 
>
> 
>
>
> thank u in advance
>
> Regards
>
> Elio

Re: Updating large amounts of data

2018-09-14 Thread Andy Seaborne





On 14/09/18 14:18, Markus Neumann wrote:

Hi Andy,

thanks for pointing that out.
what would you recommend as a heap size?


From a pure storage point of view, 2-4G per dataset.

Query can take some workspace, and HTTP handling, as can other Jena modules.

And Java itself.

And the minimum needed is to be avoided as it can lead to excessive GC's.

Ultimately there is no choice but to try.

128GB Server? Nothing else of note using RAM?  Try 10-16G (as a guess).

Andy




Am 14.09.2018 um 15:04 schrieb Andy Seaborne :



On 12/09/18 16:08, Markus Neumann wrote:

Hi,
we are running a Fuseki server that will hold about 2.2 * 10^9 triples of 
meteorological data eventually.
I currently run it with "-Xmx80GB" on a 128GB Server. The database is TDB2 on a 
900GB SSD.


Not sure if this is mentioned later in the thread (I'm in catch-up mode) but 
for TDB/TDB2, a lot of the workspace isn't in the heap, its the OS file system 
cache, so a bigger Java heap can actually slow things down.

Andy


Now I face several performance issues:
1. Inserting data:
It takes more than one hour to upload the measurements of a month 
(7.5GB .ttl file ~ 16 Mio triples) (using the data-upload web-interface of 
fuseki)
Is there a way to do this faster?
2. Updating data:
We get new model runs 5 times per day. This is data for the next 10 
days, that needs to be updated every time.
My idea was to create a named graph "forecast" that holds the latest 
version of this data.
Every time a new model run arrives, I create a new temporary graph to upload the 
data to. Once this is finished, I move the temporary graph to "forecast".
This seems to do the work twice as it takes 1 hour for the upload an 1 
hour for the move.
Our data consists of the following:
Locations (total 1607 -> 16070 triples):
mm-locations:8500015 a mm:Location ;
 a geosparql:Geometry ;
 owl:sameAs  ;
 geosparql:asWKT "POINT(7.61574425031 47.5425915732)"^^geosparql:wktLiteral 
;
 mm:station_name "Basel SBB GB Ost" ;
 mm:abbreviation "BSGO" ;
 mm:didok_id 8500015 ;
 geo:lat 47.54259 ;
 geo:long 7.61574 ;
 mm:elevation 273 .
Parameters (total 14 -> 56 triples):
mm-parameters:t_2m:C a mm:Parameter ;
 rdfs:label "t_2m:C" ;
 dcterms:description "Air temperature at 2m above ground in degree 
Celsius"@en ;
 mm:unit_symbol "˚C" .
Measurements (that is the huge bunch. Per day: 14 * 1607 * 48 ~ 1 Mio -> 5Mio 
triples per day):
mm-measurements:8500015_2018-09-02T00:00:00Z_t_2m:C a mm:Measurement ;
 mm:location mm-locations:8500015 ;
 mm:validdate "2018-09-02T00:00:00Z"^^xsd:dateTime ;
 mm:value 15.1 ;
 mm:parameter mm-parameters:t_2m:C .
I would really appreciate if someone could give me some advice on how to handle 
this tasks or point out things I could do to optimize the organization of the 
data.
Many thanks and kind regards
Markus Neumann

Re: Updating large amounts of data

2018-09-14 Thread Dan Pritts

Note that you can use "pigz" to parallelize your gzip compression.  It 
scales roughly linearly with the number of CPU cores.


Similarly "pbzip2" for bzip2.

Unimportant here, but note that pbzip2 also decompresses in parallel if 
it did the compression.  Normal bzip2 can still uncompress.


Andy Seaborne wrote on 9/14/18 9:06 AM:



On 13/09/18 12:26, Markus Neumann wrote:

Hi Rob,

seems like Fuseki doesn't handle gzip. I created the file with `tar 
-cvzf tar_test.ttl.gz large_input.ttl` so it should be a standard gzip.


That will be a standard tar file that has been compressed gzip.

Run gzip itself on the file:

gzip < large_input.ttl > large_input.ttl.gz




--
Dan Pritts
ICPSR Computing & Network Services
University of Michigan

Re: Updating large amounts of data

2018-09-14 Thread Markus Neumann

Hi Andy,

thanks for pointing that out.
what would you recommend as a heap size?

> Am 14.09.2018 um 15:04 schrieb Andy Seaborne :
> 
> 
> 
> On 12/09/18 16:08, Markus Neumann wrote:
>> Hi,
>> we are running a Fuseki server that will hold about 2.2 * 10^9 triples of 
>> meteorological data eventually.
>> I currently run it with "-Xmx80GB" on a 128GB Server. The database is TDB2 
>> on a 900GB SSD.
> 
> Not sure if this is mentioned later in the thread (I'm in catch-up mode) but 
> for TDB/TDB2, a lot of the workspace isn't in the heap, its the OS file 
> system cache, so a bigger Java heap can actually slow things down.
> 
>Andy
> 
>> Now I face several performance issues:
>> 1. Inserting data:
>>  It takes more than one hour to upload the measurements of a month 
>> (7.5GB .ttl file ~ 16 Mio triples) (using the data-upload web-interface of 
>> fuseki)
>>  Is there a way to do this faster?
>> 2. Updating data:
>>  We get new model runs 5 times per day. This is data for the next 10 
>> days, that needs to be updated every time.
>>  My idea was to create a named graph "forecast" that holds the latest 
>> version of this data.
>>  Every time a new model run arrives, I create a new temporary graph to 
>> upload the data to. Once this is finished, I move the temporary graph to 
>> "forecast".
>>  This seems to do the work twice as it takes 1 hour for the upload an 1 
>> hour for the move.
>> Our data consists of the following:
>> Locations (total 1607 -> 16070 triples):
>> mm-locations:8500015 a mm:Location ;
>> a geosparql:Geometry ;
>> owl:sameAs  ;
>> geosparql:asWKT "POINT(7.61574425031 
>> 47.5425915732)"^^geosparql:wktLiteral ;
>> mm:station_name "Basel SBB GB Ost" ;
>> mm:abbreviation "BSGO" ;
>> mm:didok_id 8500015 ;
>> geo:lat 47.54259 ;
>> geo:long 7.61574 ;
>> mm:elevation 273 .
>> Parameters (total 14 -> 56 triples):
>> mm-parameters:t_2m:C a mm:Parameter ;
>> rdfs:label "t_2m:C" ;
>> dcterms:description "Air temperature at 2m above ground in degree 
>> Celsius"@en ;
>> mm:unit_symbol "˚C" .
>> Measurements (that is the huge bunch. Per day: 14 * 1607 * 48 ~ 1 Mio -> 
>> 5Mio triples per day):
>> mm-measurements:8500015_2018-09-02T00:00:00Z_t_2m:C a mm:Measurement ;
>> mm:location mm-locations:8500015 ;
>> mm:validdate "2018-09-02T00:00:00Z"^^xsd:dateTime ;
>> mm:value 15.1 ;
>> mm:parameter mm-parameters:t_2m:C .
>> I would really appreciate if someone could give me some advice on how to 
>> handle this tasks or point out things I could do to optimize the 
>> organization of the data.
>> Many thanks and kind regards
>> Markus Neumann
>>

Re: How to send custom header per request

2018-09-14 Thread ajs6f

> On Sep 14, 2018, at 8:57 AM, Andy Seaborne  wrote:
> On 10/09/18 16:08, Niraj Patel wrote:
>> Thank you both for replying!
>>> Can you tell us more about your use case? Are the custom headers for some 
>>> one particular purpose?
>> Sure! So our database, Allegrograph, allows us to pass down custom headers 
>> while querying or updating in order to store that information in access 
>> logs. For each SPARQL request we want to send down unique request markers 
>> and usernames in order to be able to trace from a UI click to backend calls 
>> to queries that were performed in the graph. Does that make sense? Do y’all 
>> have any ideas now that y’all know the use case?
> 
> Makes sense.
> 
> Adding HTTP header inform to track an operation end-to-end.
> 
> What might work is to have a point in the HttpOp execution flow that sees the 
> HttpGet/HttpPost/... request just before it is acted upon.

Yes, this use case is quite reasonable. A couple of immediate options:

1. You can repeatedly swap out the client while changing default headers, as 
Andy describes, and that may work fine, especially if you keep a pool of 
clients and don't keep building completely fresh ones.

2. You can put some sort of proxy in place between your Jena client and 
Allegrograph, which adds the appropriate headers. That might or might not be 
workable depending on how you're sourcing the info to build the headers.

3. You can use low-level methods like building your own HTTP client, and 
Query::serialize to fill requests with your SPARQL. (There might be something 
better to use than Query::serialize for that.)

For the future, Andy's suggestion of a "hook" in HTTP execution for 
manipulating requests sounds like a good one. Would that meet your needs, and 
if so, will you please file a ticket for it?

ajs6f

Re: Updating large amounts of data

2018-09-14 Thread Andy Seaborne





On 12/09/18 16:08, Markus Neumann wrote:

Hi,

we are running a Fuseki server that will hold about 2.2 * 10^9 triples of 
meteorological data eventually.
I currently run it with "-Xmx80GB" on a 128GB Server. The database is TDB2 on a 
900GB SSD.


Not sure if this is mentioned later in the thread (I'm in catch-up mode) 
but for TDB/TDB2, a lot of the workspace isn't in the heap, its the OS 
file system cache, so a bigger Java heap can actually slow things down.


Andy



Now I face several performance issues:
1. Inserting data:
It takes more than one hour to upload the measurements of a month 
(7.5GB .ttl file ~ 16 Mio triples) (using the data-upload web-interface of 
fuseki)
Is there a way to do this faster?
2. Updating data:
We get new model runs 5 times per day. This is data for the next 10 
days, that needs to be updated every time.
My idea was to create a named graph "forecast" that holds the latest 
version of this data.
Every time a new model run arrives, I create a new temporary graph to upload the 
data to. Once this is finished, I move the temporary graph to "forecast".
This seems to do the work twice as it takes 1 hour for the upload an 1 
hour for the move.

Our data consists of the following:

Locations (total 1607 -> 16070 triples):
mm-locations:8500015 a mm:Location ;
 a geosparql:Geometry ;
 owl:sameAs  ;
 geosparql:asWKT "POINT(7.61574425031 47.5425915732)"^^geosparql:wktLiteral 
;
 mm:station_name "Basel SBB GB Ost" ;
 mm:abbreviation "BSGO" ;
 mm:didok_id 8500015 ;
 geo:lat 47.54259 ;
 geo:long 7.61574 ;
 mm:elevation 273 .

Parameters (total 14 -> 56 triples):
mm-parameters:t_2m:C a mm:Parameter ;
 rdfs:label "t_2m:C" ;
 dcterms:description "Air temperature at 2m above ground in degree 
Celsius"@en ;
 mm:unit_symbol "˚C" .

Measurements (that is the huge bunch. Per day: 14 * 1607 * 48 ~ 1 Mio -> 5Mio 
triples per day):
mm-measurements:8500015_2018-09-02T00:00:00Z_t_2m:C a mm:Measurement ;
 mm:location mm-locations:8500015 ;
 mm:validdate "2018-09-02T00:00:00Z"^^xsd:dateTime ;
 mm:value 15.1 ;
 mm:parameter mm-parameters:t_2m:C .

I would really appreciate if someone could give me some advice on how to handle 
this tasks or point out things I could do to optimize the organization of the 
data.

Many thanks and kind regards
Markus Neumann

Re: How to send custom header per request

2018-09-14 Thread Andy Seaborne





On 10/09/18 16:08, Niraj Patel wrote:

Thank you both for replying!


Can you tell us more about your use case? Are the custom headers for some one 
particular purpose?

Sure! So our database, Allegrograph, allows us to pass down custom headers 
while querying or updating in order to store that information in access logs. 
For each SPARQL request we want to send down unique request markers and 
usernames in order to be able to trace from a UI click to backend calls to 
queries that were performed in the graph. Does that make sense? Do y’all have 
any ideas now that y’all know the use case?


Makes sense.

Adding HTTP header inform to track an operation end-to-end.

What might work is to have a point in the HttpOp execution flow that 
sees the HttpGet/HttpPost/... request just before it is acted upon.




Niraj Patel
On Sep 9, 2018, 11:58 AM -0500, ajs6f , wrote:

If the header is going to change on every request, setting default headers may 
not be flexible enough-- the OP would have to change clients for every query.


Yes - but it works on the released version and isn't (I think) too 
expensive caveat it might break connection caching.




Can you tell us more about your use case? Are the custom headers for some one 
particular purpose?

In a released version, you may have to use your own HTTP client and just let 
Jena build the request bodies and parse the response bodies. We can look at 
adding this to the API in a future release, but I'd like to hear more about the 
use case first.

ajs6f


On Sep 9, 2018, at 12:20 PM, Andy Seaborne  wrote:



On 08/09/18 21:00, Niraj Patel wrote:

Hi!
I am using Jena's QueryEngineHTTP for queries and RemoteUpdateRequest for 
updates. I would like to send a custom header that will differ on each request. 
I did some digging around and it seems like it’s not possible. Using default 
headers when configuring Apache’s Http Client wouldn’t work in this case. Any 
ideas?
Niraj Patel


Hi there,

either create a QueryEngineHTTP directly passing in the required HttpClient 
specially created with setDefaultHeaders(headers)

Also, have a look at the builder for RDFConnections that are remote: 
RDFConnectionRemote.create(RDFConnectionRemote)

What's the use case for the custom header? I'm wondering if it is a usual or 
unusual situation.

Andy

Re: Updating large amounts of data

2018-09-14 Thread Markus Neumann

I will do. Thanks a lot for all your help already. 
If anyone else would happen to run into this, here is how I'm handling the 
updates now:

I have a base_db, created with tdbloader2, containing the steady part of our 
dataset.
The Fuseki server config always reads the database from prod_db.

As soon as the update comes in, I do the following:
1. copy base_db to prod_db_[date_tag]
2. add the triples from the update to prod_db_[date_tag] with tdb2.tdbloader
3. run tdb2.tdbstats on prod_db_[date_tag] and bring the stats.opt file in place
4. shutdown fuseki
5. point the symlink at prod_db to prod_db_[date_tag]
6. start fuseki

What is not working yet is the spatial index but that's a different issue that 
will get it's own thread.
In theory, as the locations in our database don't change, it should work to 
have a lucene index file outside of prod_db that serves the spatial information.

The whole process takes about 1hour (tdb2.tdbloader takes that long adding a 
9GB file to a already large database) but as it does not impact on the running 
db, that would be ok.

Comments and improvements are always welcome.

Thanks and kind regards
Markus

> Am 14.09.2018 um 09:30 schrieb Marco Neumann :
> 
> I remember giving or reading advise on this here on the mailing list. if
> you can't find it here please consult the old jena archive mailing list.
> 
> if you still can't find the answer to this question please open a new
> thread and we will take it from there.
> 
> 
> On Fri, Sep 14, 2018 at 6:36 AM Markus Neumann  >
> wrote:
> 
>> I got the jar from
>> https://mvnrepository.com/artifact/org.apache.jena/jena-spatial/3.8.0 
>>  <
>> https://mvnrepository.com/artifact/org.apache.jena/jena-spatial/3.8.0 
>> >
>> but the command from the docu does not seem to work:
>> 
>> java -cp jena-spatial-3.8.0.jar jena.spatialindexer --loc
>> /srv/linked_data_store/prod_dp_2018-09-13-1
>> Error: Could not find or load main class jena.spatialindexer
>> 
>> 
>>> Am 13.09.2018 um 21:47 schrieb Marco Neumann >> >:
>>> 
>>> Set the classpath to include the spatialIndexer
>>> 
>>> On Thu 13 Sep 2018 at 20:30, Markus Neumann >> 
>> >>
>>> wrote:
>>> 
 Hi,

 spatial index creation fails.
 I tried to figure the documentation but failed. I can't find the
 jena.spatialindexer to build it manually and the one I specified in my
 config does not work when I use the tdbloader.

 Any ideas?

> Am 13.09.2018 um 19:48 schrieb Marco Neumann >> :
> 
> to create the spatial index you can take a look at the "Building a
 Spatial
> Index" section in the "Spatial searches with SPARQL" documentation here
> 
> https://jena.apache.org/documentation/query/spatial-query.html <
 https://jena.apache.org/documentation/query/spatial-query.html <
>> https://jena.apache.org/documentation/query/spatial-query.html 
>> >>
> 
> hint: if you don't get results for a spatial filter query that matches
 your
> data in the database your data isn't spatially indexed correctly. there
> will be no error or the like in the result set though.
> 
> 
> 
> On Thu, Sep 13, 2018 at 1:53 PM Markus Neumann <
>> mneum...@meteomatics.com  
>> >

  wrote:
> 
>> Thanks for the links.
>> 
>> How do I see if the loader does the spatial index? As far as I
 understood
>> the documentation, my config should produce the spatial index in
 memory. I
>> haven't figured that part completely though:
>> When I start the database from scratch, the spatial indexing works.
 After
>> a restart I have to re-upload the stations file (which is no big deal
 as it
>> is only 593K in size) to regenerate the index.
>> I couldn't get it to work with a persistent index file though.
>> 
>> Right now I'm trying the tdb2.tdbloader (Didn't see that before) and
>> it
>> seems to go even faster:
>> 12:49:11 INFO  loader   :: Add: 41,000,000
>> 2017-01-01_1M_30min.ttl (Batch: 67,980 / Avg: 62,995)
>> 12:49:11 INFO  loader   ::   Elapsed: 650.84 seconds
>> [2018/09/13 12:49:11 UTC]
>> 
>> Is there a way to tell the loader, that he should do the spatial
>> index?
>> 
>> Yes, we have to use the spatial filter eventually, so I would highly
>> appreciate some more informations on

Re: Updating large amounts of data

2018-09-14 Thread Marco Neumann

I remember giving or reading advise on this here on the mailing list. if
you can't find it here please consult the old jena archive mailing list.

if you still can't find the answer to this question please open a new
thread and we will take it from there.


On Fri, Sep 14, 2018 at 6:36 AM Markus Neumann 
wrote:

> I got the jar from
> https://mvnrepository.com/artifact/org.apache.jena/jena-spatial/3.8.0 <
> https://mvnrepository.com/artifact/org.apache.jena/jena-spatial/3.8.0>
> but the command from the docu does not seem to work:
>
> java -cp jena-spatial-3.8.0.jar jena.spatialindexer --loc
> /srv/linked_data_store/prod_dp_2018-09-13-1
> Error: Could not find or load main class jena.spatialindexer
>
>
> > Am 13.09.2018 um 21:47 schrieb Marco Neumann :
> >
> > Set the classpath to include the spatialIndexer
> >
> > On Thu 13 Sep 2018 at 20:30, Markus Neumann  >
> > wrote:
> >
> >> Hi,
> >>
> >> spatial index creation fails.
> >> I tried to figure the documentation but failed. I can't find the
> >> jena.spatialindexer to build it manually and the one I specified in my
> >> config does not work when I use the tdbloader.
> >>
> >> Any ideas?
> >>
> >>
> >>> Am 13.09.2018 um 19:48 schrieb Marco Neumann  >:
> >>>
> >>> to create the spatial index you can take a look at the "Building a
> >> Spatial
> >>> Index" section in the "Spatial searches with SPARQL" documentation here
> >>>
> >>> https://jena.apache.org/documentation/query/spatial-query.html <
> >> https://jena.apache.org/documentation/query/spatial-query.html <
> https://jena.apache.org/documentation/query/spatial-query.html>>
> >>>
> >>> hint: if you don't get results for a spatial filter query that matches
> >> your
> >>> data in the database your data isn't spatially indexed correctly. there
> >>> will be no error or the like in the result set though.
> >>>
> >>>
> >>>
> >>> On Thu, Sep 13, 2018 at 1:53 PM Markus Neumann <
> mneum...@meteomatics.com 
> >> >>
> >>> wrote:
> >>>
>  Thanks for the links.
> 
>  How do I see if the loader does the spatial index? As far as I
> >> understood
>  the documentation, my config should produce the spatial index in
> >> memory. I
>  haven't figured that part completely though:
>  When I start the database from scratch, the spatial indexing works.
> >> After
>  a restart I have to re-upload the stations file (which is no big deal
> >> as it
>  is only 593K in size) to regenerate the index.
>  I couldn't get it to work with a persistent index file though.
> 
>  Right now I'm trying the tdb2.tdbloader (Didn't see that before) and
> it
>  seems to go even faster:
>  12:49:11 INFO  loader   :: Add: 41,000,000
>  2017-01-01_1M_30min.ttl (Batch: 67,980 / Avg: 62,995)
>  12:49:11 INFO  loader   ::   Elapsed: 650.84 seconds
>  [2018/09/13 12:49:11 UTC]
> 
>  Is there a way to tell the loader, that he should do the spatial
> index?
> 
>  Yes, we have to use the spatial filter eventually, so I would highly
>  appreciate some more informations on the correct setup here.
> 
>  Many thanks.
> 
> > Am 13.09.2018 um 14:19 schrieb Marco Neumann <
> marco.neum...@gmail.com
> >>> :
> >
> > :-)
> >
> > this sounds much better Markus. now with regards to the optimizer
> >> please
> > consult the online documentation here:
> >
> > https://jena.apache.org/documentation/tdb/optimizer.html <
>  https://jena.apache.org/documentation/tdb/optimizer.html <
> >> https://jena.apache.org/documentation/tdb/optimizer.html <
> https://jena.apache.org/documentation/tdb/optimizer.html>>>
> > (it's a very simple process to create the stats file and place it in
> >> the
> > directory)
> >
> > also did the loader index the spatial data? do your queries make use
> of
>  the
> > spatial filter?
> >
> > On Thu, Sep 13, 2018 at 12:59 PM Markus Neumann <
>  mneum...@meteomatics.com   mneum...@meteomatics.com >  >> mneum...@meteomatics.com >>
> > wrote:
> >
> >> Marco,
> >>
> >> I just tried the tdbloader2 script with 1 Month of data:
> >>
> >> INFO  Total: 167,385,120 tuples : 1,143.55 seconds : 146,373.23
>  tuples/sec
> >> [2018/09/13 11:29:31 UTC]
> >> 11:41:44 INFO Index Building Phase Completed
> >> 11:41:46 INFO -- TDB Bulk Loader Finish
> >> 11:41:46 INFO -- 1880 seconds
> >>
> >> Thats already a lot better. I'm working on a way to reduce the
> amount
> >> of
> >> data by
> >> Can you give me a pointer on
> >>> don't forget to run the tdb optimizer to generate the stats.opt
> file.
> >> ? I haven't heard of that so far...
> >>
> >> A more general question:
>

Re: How to send custom header per request

Re: android Jena

Re: Updating large amounts of data

Re: Updating large amounts of data

Re: Updating large amounts of data

Re: How to send custom header per request

Re: Updating large amounts of data

Re: How to send custom header per request

Re: Updating large amounts of data

Re: Updating large amounts of data

10 matches

Site Navigation

Mail list logo

Footer information