[jira] [Comment Edited] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-09-06 Thread Semen Boikov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155203#comment-16155203
 ] 

Semen Boikov edited comment on IGNITE-6181 at 9/6/17 11:19 AM:
---

[~ascherbakov],

Thanks for fixes, after one more review I have more comments:
1.
{noformat}
/** Timed out local transactions are cleared in {@link 
#onLocalClose}. */
if (!tx.timedOut() || !tx.pessimistic())
threadMap.remove(tx.threadId(), tx);
{noformat}

Why optimistic/pessimistic txs are handled in different way? 

Actually I do not like these changes related to threadMap. I think more clear 
fix is completely get rid of threadMap, and replace it with ThreadLocal (anyway 
as I can see we always access threadMap using current thread's id).This 
ThreadLocal should be set after tx is created and cleared in close(). Please 
try implement this.

2. 100ms timeout can not fix race. Please provide more details what race are 
you talking about? Let' find correct way to fix it
3. TimeoutObject is not removed when 'tx.rollback' is called (I changed your 
test to reproduce it, take a look), please fix.
4. Please add tests for all tx isolation/concurrency types.
5. It seems we do not need this TimeoutObject for implicit transactions (i.e. 
just cache.put without explicit txStart call) ? Please add this check.
6. About changes in IgniteTransactionsImpl.txStart0: as I understand it is 
possible to get tx there only if tx.close() was not called? This is incorrect 
API usage and I think exception is ok in this cases, so changes in 
IgniteTransactionsImpl.txStart0 are not needed.

Thanks!



was (Author: sboikov):
[~ascherbakov]],

Thanks for fixes, after one more review I have more comments:
1.
{noformat}
/** Timed out local transactions are cleared in {@link 
#onLocalClose}. */
if (!tx.timedOut() || !tx.pessimistic())
threadMap.remove(tx.threadId(), tx);
{noformat}

Why optimistic/pessimistic txs are handled in different way? 

Actually I do not like these changes related to threadMap. I think more clear 
fix is completely get rid of threadMap, and replace it with ThreadLocal (anyway 
as I can see we always access threadMap using current thread's id).This 
ThreadLocal should be set after tx is created and cleared in close(). Please 
try implement this.

2. 100ms timeout can not fix race. Please provide more details what race are 
you talking about? Let' find correct way to fix it
3. TimeoutObject is not removed when 'tx.rollback' is called (I changed your 
test to reproduce it, take a look), please fix.
4. Please add tests for all tx isolation/concurrency types.
5. It seems we do not need this TimeoutObject for implicit transactions (i.e. 
just cache.put without explicit txStart call) ? Please add this check.
6. About changes in IgniteTransactionsImpl.txStart0: as I understand it is 
possible to get tx there only if tx.close() was not called? This is incorrect 
API usage and I think exception is ok in this cases, so changes in 
IgniteTransactionsImpl.txStart0 are not needed.

Thanks!


> Tx is not rolled back on timeout leading to potential whole grid hang
> -
>
> Key: IGNITE-6181
> URL: https://issues.apache.org/jira/browse/IGNITE-6181
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 2.3
>
>
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.cache;
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteException;
> import org.apache.ignite.configuration.CacheConfiguration;
> import 

[jira] [Comment Edited] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-08-25 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141996#comment-16141996
 ] 

Alexei Scherbakov edited comment on IGNITE-6181 at 8/25/17 6:54 PM:


"By design pessimistic transaction can detect timeout if some transaction 
related process is in progress (lock acquiring or prepare phase)"

This is exactly the thing needed to be somehow fixed.

If a transaction is started and abandoned for some case(e.g exception in the 
middle and not wrapping a transaction with try-with-resources block), it will 
lock all subsequent transactions(without timeouts) which are trying to acquire 
locks held by first transaction and will block exchange process.

I suggest to associate a TimeoutObject with a transaction having non-zero 
timeout.

Reopening the issue.


was (Author: ascherbakov):
"By design pessimistic transaction can detect timeout if some transaction 
related process is in progress (lock acquiring or prepare phase)"

This is exactly the thing needed to be somehow fixed.

If a transaction is started and abandoned for some case(e.g exception in the 
middle and not wrapping a transaction with try-with-resources block), it will 
lock all subsequent transactions(without timeouts) which are trying to acquire 
locks held by hanging thread and will block exchange process.

I suggest to associate a TimeoutObject with a transaction having non-zero 
timeout.

Reopening the issue.

> Tx is not rolled back on timeout leading to potential whole grid hang
> -
>
> Key: IGNITE-6181
> URL: https://issues.apache.org/jira/browse/IGNITE-6181
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
> Fix For: 2.3
>
>
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.cache;
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteException;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.TransactionConfiguration;
> import org.apache.ignite.internal.IgniteInternalFuture;
> import org.apache.ignite.internal.util.typedef.internal.U;
> import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests ability to rollback not properly closed transaction.
>  */
> public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
> /** */
> private static final long TX_TIMEOUT = 3_000L;
> /** */
> private static final String CACHE_NAME = "test";
> /** IP finder. */
> private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
> TcpDiscoveryVmIpFinder(true);
> /** */
> private final CountDownLatch l = new CountDownLatch(1);
> /** */
> private final Object mux = new Object();
> /** {@inheritDoc} */
> @Override protected IgniteConfiguration getConfiguration(String 
> igniteInstanceName) throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);
> cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(IP_FINDER));
> TransactionConfiguration txCfg = new TransactionConfiguration();
> txCfg.setDefaultTxTimeout(TX_TIMEOUT);
> cfg.setTransactionConfiguration(txCfg);
> CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
> ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> cfg.setCacheConfiguration(ccfg);
> return cfg;
> }

[jira] [Comment Edited] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-08-25 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141996#comment-16141996
 ] 

Alexei Scherbakov edited comment on IGNITE-6181 at 8/25/17 6:52 PM:


"By design pessimistic transaction can detect timeout if some transaction 
related process is in progress (lock acquiring or prepare phase)"

This is exactly the thing needed to be somehow fixed.

If a transaction is started and abandoned for some case(e.g exception in the 
middle and not wrapping a transaction with try-with-resources block), it will 
lock all other transactions(without timeouts) which are trying to acquire locks 
held by hanging thread and will block exchange process.

I suggest to associate a TimeoutObject with a transaction having non-zero 
timeout.

Reopening the issue.


was (Author: ascherbakov):
"By design pessimistic transaction can detect timeout if some transaction 
related process is in progress (lock acquiring or prepare phase)"

This is exactly the thing needed to be somehow fixed.

If a transaction is started and abandoned for some case(e.g exception in the 
middle and not wrapping a transaction with try-with-resources block) without 
proper closing, it will lock all other transactions(without timeouts) which are 
trying to acquire locks held by hanging thread and will block exchange process.

I suggest to associate a TimeoutObject with a transaction having non-zero 
timeout.

Reopening the issue.

> Tx is not rolled back on timeout leading to potential whole grid hang
> -
>
> Key: IGNITE-6181
> URL: https://issues.apache.org/jira/browse/IGNITE-6181
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
> Fix For: 2.3
>
>
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.cache;
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteException;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.TransactionConfiguration;
> import org.apache.ignite.internal.IgniteInternalFuture;
> import org.apache.ignite.internal.util.typedef.internal.U;
> import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests ability to rollback not properly closed transaction.
>  */
> public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
> /** */
> private static final long TX_TIMEOUT = 3_000L;
> /** */
> private static final String CACHE_NAME = "test";
> /** IP finder. */
> private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
> TcpDiscoveryVmIpFinder(true);
> /** */
> private final CountDownLatch l = new CountDownLatch(1);
> /** */
> private final Object mux = new Object();
> /** {@inheritDoc} */
> @Override protected IgniteConfiguration getConfiguration(String 
> igniteInstanceName) throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);
> cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(IP_FINDER));
> TransactionConfiguration txCfg = new TransactionConfiguration();
> txCfg.setDefaultTxTimeout(TX_TIMEOUT);
> cfg.setTransactionConfiguration(txCfg);
> CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
> ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> cfg.setCacheConfiguration(ccfg);
> return 

[jira] [Comment Edited] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-08-25 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141996#comment-16141996
 ] 

Alexei Scherbakov edited comment on IGNITE-6181 at 8/25/17 6:51 PM:


"By design pessimistic transaction can detect timeout if some transaction 
related process is in progress (lock acquiring or prepare phase)"

This is exactly the thing needed to be somehow fixed.

If a transaction is started and abandoned for some case(e.g exception in the 
middle and not wrapping a transaction with try-with-resources block) without 
proper closing, it will lock all other transactions(without timeouts) which are 
trying to acquire locks held by hanging thread and will block exchange process.

I suggest to associate a TimeoutObject with a transaction having non-zero 
timeout.

Reopening the issue.


was (Author: ascherbakov):
"By design pessimistic transaction can detect timeout if some transaction 
related process is in progress (lock acquiring or prepare phase)"

This is exactly the thing needed to be somehow fixed.

If a transaction is started and abandoned for some case(e.g exception in the 
middle) without proper closing, it will lock all other transactions(without 
timeouts) which are trying to acquire locks held by hanging thread and will 
block exchange process.

I suggest to associate a TimeoutObject with a transaction having non-zero 
timeout.

Reopening the issue.

> Tx is not rolled back on timeout leading to potential whole grid hang
> -
>
> Key: IGNITE-6181
> URL: https://issues.apache.org/jira/browse/IGNITE-6181
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
> Fix For: 2.3
>
>
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.cache;
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteException;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.TransactionConfiguration;
> import org.apache.ignite.internal.IgniteInternalFuture;
> import org.apache.ignite.internal.util.typedef.internal.U;
> import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests ability to rollback not properly closed transaction.
>  */
> public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
> /** */
> private static final long TX_TIMEOUT = 3_000L;
> /** */
> private static final String CACHE_NAME = "test";
> /** IP finder. */
> private static final TcpDiscoveryVmIpFinder IP_FINDER = new 
> TcpDiscoveryVmIpFinder(true);
> /** */
> private final CountDownLatch l = new CountDownLatch(1);
> /** */
> private final Object mux = new Object();
> /** {@inheritDoc} */
> @Override protected IgniteConfiguration getConfiguration(String 
> igniteInstanceName) throws Exception {
> IgniteConfiguration cfg = super.getConfiguration(igniteInstanceName);
> cfg.setDiscoverySpi(new TcpDiscoverySpi().setIpFinder(IP_FINDER));
> TransactionConfiguration txCfg = new TransactionConfiguration();
> txCfg.setDefaultTxTimeout(TX_TIMEOUT);
> cfg.setTransactionConfiguration(txCfg);
> CacheConfiguration ccfg = new CacheConfiguration(CACHE_NAME);
> ccfg.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
> cfg.setCacheConfiguration(ccfg);
> return cfg;
> }
> /** */
> public 

[jira] [Comment Edited] (IGNITE-6181) Tx is not rolled back on timeout leading to potential whole grid hang

2017-08-25 Thread Andrey Gura (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141718#comment-16141718
 ] 

Andrey Gura edited comment on IGNITE-6181 at 8/25/17 3:00 PM:
--

[~ascherbakov] Test is incorrect in a manner. By design pessimistic transaction 
can detect timeout if some transaction related process is in progress (lock 
acquiring or prepare phase). In your test transaction does nothing because it 
has locked entry already but can't reach {{tx.commit()}} statement, so there is 
no any entity that can decide that time out happened. 

It means that if user's thread executes some long running job in the same 
thread in which transaction was initialized that timeout will not detected 
until any other transactional operation will be invoked (cache write or 
commit). 

The simplest test for pessimistic transaction timeout is the following:

{code:java}
public void testTxTimeout() throws Exception {
final Ignite ignite = startGrid();

try (Transaction tx = ignite.transactions().txStart()) {
ignite.cache(CACHE_NAME).put(1, 1);

U.sleep(TX_TIMEOUT + 1000L);

tx.commit();

fail();
}
catch (IgniteException e) {
// Expect exception - tx is rolled back.
}
}
{code}


was (Author: agura):
[~ascherbakov] Test is incorrect in some in a manner. By design pessimistic 
transaction can detect timeout if some transaction related process is in 
progress (lock acquiring or prepare phase). In your test transaction does 
nothing because it has locked entry already but can't reach {{tx.commit()}} 
statement, so there is no any entity that can decide that time out happened. 

It means that if user's thread executes some long running job in the same 
thread in which transaction was initialized that timeout will not detected 
until any other transactional operation will be invoked (cache write or 
commit). 

The simplest test for pessimistic transaction timeout is the following:

{code:java}
public void testTxTimeout() throws Exception {
final Ignite ignite = startGrid();

try (Transaction tx = ignite.transactions().txStart()) {
ignite.cache(CACHE_NAME).put(1, 1);

U.sleep(TX_TIMEOUT + 1000L);

tx.commit();

fail();
}
catch (IgniteException e) {
// Expect exception - tx is rolled back.
}
}
{code}

> Tx is not rolled back on timeout leading to potential whole grid hang
> -
>
> Key: IGNITE-6181
> URL: https://issues.apache.org/jira/browse/IGNITE-6181
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.1
>Reporter: Alexei Scherbakov
> Fix For: 2.3
>
>
> Unit test reproducer:
> {noformat}
> /*
>  * Licensed to the Apache Software Foundation (ASF) under one or more
>  * contributor license agreements.  See the NOTICE file distributed with
>  * this work for additional information regarding copyright ownership.
>  * The ASF licenses this file to You under the Apache License, Version 2.0
>  * (the "License"); you may not use this file except in compliance with
>  * the License.  You may obtain a copy of the License at
>  *
>  *  http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package org.apache.ignite.cache;
> import java.util.concurrent.CountDownLatch;
> import java.util.concurrent.TimeUnit;
> import java.util.concurrent.atomic.AtomicBoolean;
> import org.apache.ignite.Ignite;
> import org.apache.ignite.IgniteException;
> import org.apache.ignite.configuration.CacheConfiguration;
> import org.apache.ignite.configuration.IgniteConfiguration;
> import org.apache.ignite.configuration.TransactionConfiguration;
> import org.apache.ignite.internal.IgniteInternalFuture;
> import org.apache.ignite.internal.util.typedef.internal.U;
> import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi;
> import org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder;
> import org.apache.ignite.testframework.junits.common.GridCommonAbstractTest;
> import org.apache.ignite.transactions.Transaction;
> import org.apache.ignite.transactions.TransactionConcurrency;
> import org.apache.ignite.transactions.TransactionIsolation;
> /**
>  * Tests ability to rollback not properly closed transaction.
>  */
> public class IgniteTxTimeoutTest extends GridCommonAbstractTest {
> /** */
> private static final long TX_TIMEOUT = 3_000L;
>