Re: Txn code not handling type of transaction

George News Fri, 29 Dec 2017 05:28:13 -0800

On 2017-12-29 12:59, ajs6f wrote:
> It's going to be a lot harder to solve this issue if other people
> cannot replicate it. You might want to write an integration test
> using something like Awaitility.


I will have a look at it. It is not that I don't want to share the code,
it is currently I can due to internal policies.

> One change you could make that probably won't solve your problem but
> will make your code easier to understand would be to stop using
> Graphs. That's part of the Jena SPI and while those types are stable,
> the intention is for you to use the API (Dataset and Model) and
> there's no reason here not to.
> 
> Model union =
> dataset.getNamedModel("G1").union(dataset.getNamedModel("G2")).union(dataset.getNamedModel("G4"));
>
>  etc.

I will try that to check if it works better. Keep you posted on it.

I'm also thinking on creating a queue to handle writings and reading.
This way I will parallelize readings and stop them to lauch writing
based on scheduler.

> If you expect to scale this application very far, you will probably
> want to solve your problem by introducing Fuseki and having your
> application make calls to it, instead of trying to manage concurrency
> yourself. You are already using SPARQL according to your example.
> Then you can use SPARQL keywords to pick out the named graphs
> together over which you want to run your query:
> 
> https://jena.apache.org/tutorials/sparql_datasets.html

this is not the way. I'm not parsing the SPARQL sentences. The think is
that graphs are internally created periodically in order to speed up
response times. We have noticed that one single graph handling all our
data makes the system quite slow and unresponsive. But dividing it into
smaller subgraphs makes the system quicker. Then we merge graphs
depending on user requests (REST endpoint and not FROM GRAPH).

> Otherwise, your code shows you using qExec.execSelect(), but your
> stacktrace shows the exception arising after
> QueryExecutionBase.execAsk, so are you actually executing different
> code than what you are showing us?

Sorry for that. I launch a ASK in order to check if there are values in
the ResultSet or not. This was something I asked in another thread
(Problem with MAX when no result expected).

But anyway, suppose that ASK is a SELECT as the behaviour is similar if
I only enable select.

> ajs6f
> 
>> On Dec 29, 2017, at 4:51 AM, George News <[email protected]>
>> wrote:
>> 
>> On 2017-12-28 18:02, dandh988 wrote:
>>> Can you give a complete example? How are you calling MultiUnion?
>>> Are you calling txn read on the dataset then building the union,
>>> then calling txn write to update the graph?
>> 
>> Let's describe a use case.
>> 
>> 1) I have a dataset that includes 4 named graphs: G1, G2, G3, G4 
>> 2.a) Someone initiates a sparql request and the internal procedure
>> followed is: return Txn.calculateRead(dataset, () -> { // Create
>> multiunion of 3 namegraphs MultiUnion union = new MultiUnion(); 
>> union.addGraph(dataset.getNamedModel("G1").getGraph()); 
>> union.addGraph(dataset.getNamedModel("G2").getGraph()); 
>> union.addGraph(dataset.getNamedModel("G4").getGraph());
>> 
>> Model m = ModelFactory.createModelForGraph(union);
>> 
>> // Launch Sparql query on it try (QueryExecution qExec =
>> QueryExecutionFactory.create(query, m)) { return
>> ResultSetFactory.copyResults(qExec.execSelect()) } });
>> 
>> 2.b) Someone initiates a sparql request and the internal procedure
>> followed is: return Txn.calculateRead(dataset, () -> { // Create
>> multiunion of 2 namegraphs // This code in in a function MultiUnion
>> union = new MultiUnion(); 
>> union.addGraph(dataset.getNamedModel("G1").getGraph()); 
>> union.addGraph(dataset.getNamedModel("G2").getGraph()); Model m1 =
>> ModelFactory.createModelForGraph(union);
>> 
>> // Retrieve namegraph G3 // This code in in a function Model m2 =
>> dataset.getNamedModel("G3");
>> 
>> Model m = ModelFactory.createUnion(m2, m1);
>> 
>> // Launch Sparql query on it try (QueryExecution qExec =
>> QueryExecutionFactory.create(query, m)) { return
>> ResultSetFactory.copyResults(qExec.execSelect()) } });
>> 
>> 
>> 3) Someone initiaties a write in G2 // m stores the new entity
>> model Model m; Txn.executeWrite(dataset, () -> { 
>> dataset.getNamedModel("G2").add(m); });
>> 
>> 
>> 
>> If either version of 2), and 3) are not done in parallel there is
>> no problem and everything is executed correctly.
>> 
>> The problem arise when 2.a) or 2.b) is run, and before ending
>> someone tries to perform 3). Then I get the FileException("In the
>> middle of an alloc-write").
>> 
>> Do you have an idea on how to avoid this? How can I handle
>> transactions in this model?
>> 
>> 
>> I was thinking on creating a global mutex so if any action is being
>> performed over the dataset, then the rest would be blocked. The
>> problem here is that the code is part of a webservice and then if
>> the read/write operation lasts long, I will get a timeout that will
>> close the connection.
>> 
>> The other option is to disable writing while someone is reading.
>> The main problem here is how to properly reschedule writings not to
>> have a big queue.
>> 
>> Any help is more than welcome. I don't know what else to do to
>> solve this is issue, and the problem is making the service unusable
>> :(
>> 
>> Thanks a lot for the great help you are all offering. Jorge
>> 
>>> 
>>> 
>>> Dick -------- Original message --------From: George News 
>>> <[email protected]> Date: 28/12/2017  13:17  (GMT+00:00) To: 
>>> [email protected] Subject: Re: Txn code not handling type of 
>>> transaction On 2017-12-27 19:32, Andy Seaborne wrote:
>>>> 
>>>> 
>>>> On 27/12/17 18:19, George News wrote:
>>>>> One think I have forgotten to mention is that I'm using a
>>>>> static variable to store the dataset reference. Is this
>>>>> important?
>>>> 
>>>> No.
>>>> 
>>>> The top layer of transaction code uses ThreadLocal variables.
>>>> 
>>>> Internally, there are Transaction objects.
>>>> 
>>>> Andy
>>> 
>>> Another doubt on the issue with transactions. I'm generating 
>>> Multiunions from the same dataset graphs. While in one thread
>>> I'm reading from a multiunion, on another thread I'm writing to
>>> one of the graphs included in the multiunion (standalone graph,
>>> not multiunion).
>>> 
>>> I have modified my code to use Txn everywhere and still having 
>>> issues. This is why I'm asking about how multiunions are handle 
>>> concerning transactions.
>>> 
>>> 
>>>>> 
>>>>> 
>>>>> On 2017-12-27 19:13, George News wrote:
>>>>>> Just out of curiosity, just in case the problem is that 
>>>>>> everything is done in the same thread. Do you know if
>>>>>> Wildfly is handling every request under the same thread? I
>>>>>> guess not, it will be really strange.
>>>> 
>>>>>> 
>>>>>> The point is that I have one REST endpoint for writing and 
>>>>>> another for reading. Writing is done almost per 1 to 10 
>>>>>> seconds. If I execute a reading that takes longer than
>>>>>> that, I get the exception on alloc-write.
>>>>>> 
>>>>>> Thanks a lot. I just don't know why it is happening. Now I 
>>>>>> think the code is using Txn (I have one point where I
>>>>>> copied the Txn.java behaviour) and still got the error.
>>>>>> 
>>>>>> On 2017-12-27 17:18, ajs6f wrote:
>>>>>>>> Then nesting is not safe as you might have open
>>>>>>>> initially a read transaction and then include a write.
>>>>>>>> If the parent one is a write there shouldn't be such an
>>>>>>>> issue (I guess).
>>>>>>> 
>>>>>>> Actual nesting is not supported right now, period. 
>>>>>>> Transactions in Jena are currently thread-local, so what
>>>>>>> you describe above I would see as a mistaken use of the
>>>>>>> API. If a client needs to move from a READ to a WRITE
>>>>>>> transaction, it's usually appropriate to close the
>>>>>>> transaction and open a new one (transactions in TIM, for
>>>>>>> instance, are snapshot isolated, and I believe the same
>>>>>>> is true of TDB2). Transactional objects _should not_ be
>>>>>>> passed from thread to thread with open transactions,
>>>>>>> unless additional client machinery is in place to manage
>>>>>>> those transactions, such as Dick has described in another
>>>>>>> thread (using thread proxies).
>>>>>>> 
>>>>>>> Promotion machinery does exist in Jena but I am not
>>>>>>> aware that any of the dataset implementations actually
>>>>>>> support it right now. I could be wrong about that, since
>>>>>>> I didn't write the promotion code. Jena isn't a SQL
>>>>>>> database and doesn't offer the same kinds of guarantees
>>>>>>> or tradeoffs. Correctly promoting transactions in the
>>>>>>> absence of information about data dependencies is
>>>>>>> non-trivial.
>>>>>>> 
>>>>>>> That having been said, it should be possible to add a 
>>>>>>> "transaction type" method to transactional objects,
>>>>>>> within the "thread-local" design. Some dataset
>>>>>>> implementations already have one, e.g.
>>>>>>> DatasetGraphInMemory::transactionType. You might want to
>>>>>>> start by adding it to the o.a.j.sparql.core.Transactional
>>>>>>> interface and then "catching up" the implementation code
>>>>>>> to build up a PR.
>>>>>>> 
>>>>>>> Adam Soroka
>>>>>>> 
>>>>>>>> On Dec 27, 2017, at 6:53 AM, George News 
>>>>>>>> <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> On 2017-12-27 12:29, Claude Warren wrote:
>>>>>>>>> I recently wrote some code to  try to handle a
>>>>>>>>> similar situation. In my case I knew I needed a
>>>>>>>>> transaction to be active at various points so I
>>>>>>>>> created a TransactionHolder.  I create the holder and
>>>>>>>>> passing the object that has implements Transactional
>>>>>>>>> as well as the type of ReadWrite I want.
>>>>>>>>> 
>>>>>>>>> If the transaction is active it does nothing (and I
>>>>>>>>> hope the proper transaction has been started)
>>>>>>>>> otherwise It starts the transaction. Ad the end I
>>>>>>>>> call commit or abort as appropriate.  If I did not
>>>>>>>>> start the transaction the commit, abort or end is
>>>>>>>>> ignored.
>>>>>>>> 
>>>>>>>> I see the same problem in your code as I pointed out 
>>>>>>>> before. A transaction behaves differently if it is READ
>>>>>>>> or WRITE and in your code there is not such a thing.
>>>>>>>> Actually the isInTransaction() doesn't give you this
>>>>>>>> information.
>>>>>>>> 
>>>>>>>> Then nesting is not safe as you might have open
>>>>>>>> initially a read transaction and then include a write.
>>>>>>>> If the parent one is a write there shouldn't be such an
>>>>>>>> issue (I guess).
>>>>>>>> 
>>>>>>>> Just for you to know, your code is more or less
>>>>>>>> integrated in [1], so you can "update" your code.
>>>>>>>> 
>>>>>>>> [1]: 
>>>>>>>> jena/jena-arq/src/main/java/org/apache/jena/system/Txn.java
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>
>>>>>>>> 
I think there may be an issue with abort in that it should
>>>>>>>>> probablyset up end() to throw an exception when I
>>>>>>>>> have not created the transaction  so that the outer 
>>>>>>>>> transaction will fail.
>>>>>>>>> 
>>>>>>>>> import org.apache.jena.query.ReadWrite; import 
>>>>>>>>> org.apache.jena.sparql.core.Transactional;
>>>>>>>>> 
>>>>>>>>> public class TransactionHolder  { private final 
>>>>>>>>> Transactional txn; private final boolean started;
>>>>>>>>> private final ReadWrite rw;
>>>>>>>>> 
>>>>>>>>> public TransactionHolder( Transactional txn,
>>>>>>>>> ReadWrite rw ) { this.txn = txn; this.rw = rw;
>>>>>>>>> started = ! txn.isInTransaction(); if (started) {
>>>>>>>>> txn.begin( rw ); } }
>>>>>>>>> 
>>>>>>>>> public boolean ownsTranaction() { return started; }
>>>>>>>>> 
>>>>>>>>> public void commit() { if (started) { txn.commit(); }
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> public void abort() { if (started) { txn.abort(); }
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> public void end() { if (started) { txn.end(); } }
>>>>>>>>> 
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Dec 27, 2017 at 11:03 AM, dandh988 
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>> 
>>>>>>>>>> You cannot nest transactions nor can you promote a
>>>>>>>>>> read to a write. You need to rewrite your code or
>>>>>>>>>> use txn which correctly checks if a transaction is
>>>>>>>>>> available and if not will begin the correct one,
>>>>>>>>>> either READ or WRITE.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Dick -------- Original message --------From:
>>>>>>>>>> George News <[email protected]> Date: 27/12/2017
>>>>>>>>>> 10:27 (GMT+00:00) To: Jena User Mailing List < 
>>>>>>>>>> [email protected]> Subject: Txn code not
>>>>>>>>>> handling type of transaction Hi,
>>>>>>>>>> 
>>>>>>>>>> As you know from other threads I'm having some
>>>>>>>>>> issues with transactions. Your suggestion is to use
>>>>>>>>>> Txn instead of begin/end. Just for curiosity I have
>>>>>>>>>> checked the Txn code at [1] and it seems that
>>>>>>>>>> inside you use begin/end.
>>>>>>>>>> 
>>>>>>>>>> However I have a doubt concerning how you handle
>>>>>>>>>> the begin/end for READ and WRITE. It seems that you
>>>>>>>>>> open a transaction based on txn.isInTransaction(),
>>>>>>>>>> but how do you know if it is a READ or WRITE?
>>>>>>>>>> 
>>>>>>>>>> If you create something like:
>>>>>>>>>> 
>>>>>>>>>> Txn.executeRead(dataset, {
>>>>>>>>>> Txn.executeWrite(dataset, { // Whatever } } }
>>>>>>>>>> 
>>>>>>>>>> the txn.begin(ReadWrite.WRITE) is not called and 
>>>>>>>>>> therefore it might be leading to unexepected
>>>>>>>>>> behaviours for the txn.commit().
>>>>>>>>>> 
>>>>>>>>>> could you give some hints on how this is handle 
>>>>>>>>>> internally? Before fully modify the code I have,
>>>>>>>>>> it might be easier to replicate the txn behaviour
>>>>>>>>>> ;) but I would like to know the above (if
>>>>>>>>>> possible).
>>>>>>>>>> 
>>>>>>>>>> As always, thanks in advanced Jorge
>>>>>>>>>> 
>>>>>>>>>> [1]: 
>>>>>>>>>> jena/jena-arq/src/main/java/org/apache/jena/system/Txn.java
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>
>>>>>>>>>> 
>

Re: Txn code not handling type of transaction

Reply via email to