Re: Transactions linked to dataset object and not TDB itself

George News Wed, 13 Sep 2017 01:49:16 -0700

On 2017-09-12 23:21, Andy Seaborne wrote:
> Thanks Rob.
> 
> "isInTransaction" means "have I called begin?"
> 
> 
> On 12/09/17 12:07, George News wrote:
>> On 2017-09-12 12:05, Rob Vesse wrote:
>>> I think that Andy’s response was perhaps not comprehensive enough.
>>>
>>> As he explained a storage area supports any number of read
>>> transactions and at the most one write transaction. The number of
>>> active transactions is centrally managed within the JVM and scoped by
>>> the storage area.
>>>
>>> However, each dataset instance you create has its own isolated
>>> transactional view onto this storage. This allows for transactions to
>>> interleave safely, that point at which the transaction begins will
>>> dictate what view each dataset instance gets. So for example you
>>> could start a read transaction that is long-running, have a short
>>> write transaction and then another read transaction. The first read
>>> transaction is seeing the state of the database prior to the write
>>> whereas the second read transaction would be seeing the state of the
>>> database after the write.
>>
>> Nice explanation ;) But then if I use Fuseki approach of a static
>> Dataset, will I get an HTTP response for an operation if there is a
>> previous transaction running?
> 
> Different question.
> 
> Fuseki does not call "isInTransaction" - it calls begin and only ever
> has one dataset on one thread.
> 
> Two datasets on one thread is a somewhat specialised case (and actually
> exposing the fact that deep inside TDB, the transaction menchanism is
> not completely ties to threads).
> 
> If an outstanding write is happening, a second begin(WRITE) will block.
> Otherwise, you get parallelism.
> 
> You can try this.  Its easier to put paused inside queries or iupdates -
> use "afn:wait(5000)", a function, inside a pattern match (query or
> update) and that will pause for 5 seconds holding the transaction open.
> 
>     Andy
>


Sorry for insisting to much on this issue but I don't fully understand
the behaviour. Let's try with some examples with the static reference to
the dataset object retrieved from TDBFactory.createDataset(path).

# Case 1) Static reference to dataset for read

- Server receives an HTTP request and an instance of the REST endpoint
management class is created
- Server class instance 1 (SCI1) uses static dataset reference
  dataset.begin(ReadWrite.READ);
- SCI1 starts a quite time consuming read operation (Operation1)
- Server receives another HTTP request and an instance of the REST
endpoint management class is created.
- Server class instance 2 (SCI2) uses static dataset reference
  dataset.begin(ReadWrite.READ);
- SCI2 read operation finishes and ends transaction.
  dataset.end();
- SCI1 read operation finishes and ends transaction.
  dataset.end();

In this case both operations are fulfilled and you get the results or
does SCI2 blocks because the dataset is in a transaction?
How does the dataset knows which transaction has ended as both calls are
over the same object?


# Case 2) Static reference to dataset for read/write

- Server receives an HTTP request and an instance of the REST endpoint
management class is created
- Server class instance 1 (SCI1) uses static dataset reference
  dataset.begin(ReadWrite.READ);
- SCI1 starts a quite time consuming read operation
- Server receives another HTTP request, in this case for writing, and an
instance of the REST endpoint management class is created.
- Server class instance 2 (SCI2) uses static dataset reference
  dataset.begin(ReadWrite.WRITE);
- SCI2 write operation finishes and ends transaction.
  dataset.commit()
  dataset.end();
- SCI1 read operation finishes and ends transaction.
  dataset.end();

In this case ¿are both operations are fulfilled or SCI2 is not going to
work? Does SCI1 only get data referenced till the SC2 commits? Is there
any block in execution?


# Case 3 Static reference to dataset for write

- Server receives an HTTP request and an instance of the REST endpoint
management class is created
- Server class instance 1 (SCI1) uses static dataset reference
  dataset.begin(ReadWrite.WRITE);
- SCI1 starts a quite time consuming write operation
- Server receives another HTTP request and an instance of the REST
endpoint management class is created.
- Server class instance 2 (SCI2) uses static dataset reference
  dataset.begin(ReadWrite.WRITE);
- SCI2 write operation finishes and ends transaction.
  dataset.commit();
  dataset.end();
- SCI1 read operation finishes and ends transaction.
  dataset.commit();
  dataset.end();

In this case ¿are both operations are fulfilled or is SCI2 blocked till
SCI1 ends?


Once this is understood I guess we can go on with the case of the
non-static reference ;)



>>
>>
>>> Rob
>>>
>>> On 12/09/2017 10:24, "George News" <[email protected]> wrote:
>>>
>>> On 2017-09-12 10:43, Andy Seaborne wrote:
>>>> They are per storage area.
>>>>
>>>> This blocks and never prints "DONE"
>>>>
>>>> Location loc = Location.create("DB"); Dataset dataset1 =
>>>> TDBFactory.createDataset(loc); Dataset dataset2 =
>>>> TDBFactory.createDataset(loc); dataset1.begin(ReadWrite.WRITE) ;
>>>> dataset2.begin(ReadWrite.WRITE) ; System.out.println("DONE");
>>>>
>>>> but if either a READ, it will work - there and be many readers and
>>>> one writer at a time.  The readers will not see the updated by the
>>>> writer even after the writer commits.
>>>
>>> I understands that but I still think they are not liked to the
>>> storage area. If you put
>>>
>>> Location loc = Location.create("DB"); Dataset dataset1 =
>>> TDBFactory.createDataset(loc); Dataset dataset2 =
>>> TDBFactory.createDataset(loc); dataset1.begin(ReadWrite.READ) ;
>>> System.out.println(dataset1.isInTransaction());
>>> System.out.println(dataset2.isInTransaction());
>>> dataset2.begin(ReadWrite.READ) ; System.out.println("DONE");
>>>
>>> It will print true for dataset1 and false for dataset2 cases. This
>>> means that the transaction is linked to the object Dataset and not
>>> the real location. Or at least this is what is happening to me.
>>>
>>> Therefore I think this is a bug :( as the transaction READ is opened
>>> over the same location. I haven't checked for the WRITE but I guess
>>> it should be the same. If you write on a dataset and you have a
>>> several transactions opened this means you will have a kind of a
>>> counter (semaphore) and when you call the .end() you finish them.
>>>
>>>>
>>>> Creating the datasets is quite cheap. It is not really creating
>>>> everything everytime. But a statics works as well; Fuseki uses a
>>>> static registry of datasets.
>>>>
>>>> (it's called "connect", "not "create" in TDB2 to make that
>>>> clearer).
>>>>
>>>> Andy
>>>
>>> I think the static will be the way to go for me for the cleanness of
>>> the code, as otherwise it will more complex to handle.
>>>
>>>>
>>>> On 11/09/17 15:57, George News wrote:
>>>>> Hi all,
>>>>>
>>>>> I'm facing an issue that I guess it was implemented that way for
>>>>> some reason. The issue is that I thought that transactions were
>>>>> Dataset based, not the object but the TDB or whatever database
>>>>> you use.
>>>>>
>>>>> However while developing my service I have noticed that if you
>>>>> open 2 datasets on the same TDB
>>>>>
>>>>> Dataset dataset1 = TDBFactory.createDataset(tripleStorePath);
>>>>> Dataset dataset2 = TDBFactory.createDataset(tripleStorePath);
>>>>>
>>>>> then each dataset has it's own transaction pointer, that is,
>>>>> read/write operations are block per object. Is that the expected
>>>>> behaviour? Why is like this and not blocked per triple store?
>>>>>
>>>>> Therefore my question now goes in the direction of which is
>>>>> better. I'm developing a webservice that is working against the
>>>>> same triple store path. The Dataset object I create on each call
>>>>> is link to the instance of the class (not static). Then, how
>>>>> should I proceed? Should I create the Dataset variable as static,
>>>>> so this way I only have one object for all?
>>>>>
>>>>> Thanks Regards,
>>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>>
>

Re: Transactions linked to dataset object and not TDB itself

Reply via email to