[GitHub] incubator-tephra pull request #48: [TEPHRA-241] Introduce a way to limit the...

2017-09-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-tephra/pull/48


---


[jira] [Commented] (TEPHRA-241) Introduce a way to limit the size of a transaction

2017-09-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TEPHRA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159800#comment-16159800
 ] 

ASF GitHub Bot commented on TEPHRA-241:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-tephra/pull/48


> Introduce a way to limit the size of a transaction
> --
>
> Key: TEPHRA-241
> URL: https://issues.apache.org/jira/browse/TEPHRA-241
> Project: Tephra
>  Issue Type: Improvement
>  Components: api, manager
>Affects Versions: 0.12.0-incubating
>Reporter: Andreas Neumann
>Assignee: Andreas Neumann
> Fix For: 0.13.0-incubating
>
>
> When clients perform a huge number of writes in a short transaction, that can 
> result in huge change sets. For example, if a client performs 10M writes and 
> sends that change set over, that can easily be 1GB large. The transaction 
> manager will keep this in memory. It will also write this as an edit to the 
> transaction log.
> Assume it runs out of memory because the change set is too large. It crashes 
> and when it restarts, it will replay the log, load that huge change set 
> again, and crash again. 
> To prevent this kind of systemic failure, and to encourage developers to use 
> long transactions when performing many writes, we can introduce two new 
> properties in the configuration:
> - change set warn threshold: if a change set exceeds this size, a warning is 
> logged. 
> - change set reject threshold: if a change set exceeds this size, it is 
> rejected (canCommit will throw an exception) and that will fail the 
> transaction.
> Both thresholds should be Long.MAX_VALUE by default, to preserve existing 
> behavior after upgrade. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (TEPHRA-241) Introduce a way to limit the size of a transaction

2017-09-09 Thread Andreas Neumann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEPHRA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Neumann resolved TEPHRA-241.

Resolution: Fixed

> Introduce a way to limit the size of a transaction
> --
>
> Key: TEPHRA-241
> URL: https://issues.apache.org/jira/browse/TEPHRA-241
> Project: Tephra
>  Issue Type: Improvement
>  Components: api, manager
>Affects Versions: 0.12.0-incubating
>Reporter: Andreas Neumann
>Assignee: Andreas Neumann
> Fix For: 0.13.0-incubating
>
>
> When clients perform a huge number of writes in a short transaction, that can 
> result in huge change sets. For example, if a client performs 10M writes and 
> sends that change set over, that can easily be 1GB large. The transaction 
> manager will keep this in memory. It will also write this as an edit to the 
> transaction log.
> Assume it runs out of memory because the change set is too large. It crashes 
> and when it restarts, it will replay the log, load that huge change set 
> again, and crash again. 
> To prevent this kind of systemic failure, and to encourage developers to use 
> long transactions when performing many writes, we can introduce two new 
> properties in the configuration:
> - change set warn threshold: if a change set exceeds this size, a warning is 
> logged. 
> - change set reject threshold: if a change set exceeds this size, it is 
> rejected (canCommit will throw an exception) and that will fail the 
> transaction.
> Both thresholds should be Long.MAX_VALUE by default, to preserve existing 
> behavior after upgrade. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (TEPHRA-253) TransactionProcessorTest is sometimes flaky

2017-09-09 Thread Andreas Neumann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEPHRA-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Neumann reassigned TEPHRA-253:
--

Assignee: Andreas Neumann  (was: Poorna Chandra)

> TransactionProcessorTest is sometimes flaky
> ---
>
> Key: TEPHRA-253
> URL: https://issues.apache.org/jira/browse/TEPHRA-253
> Project: Tephra
>  Issue Type: Bug
>Affects Versions: 0.12.0-incubating
>Reporter: Andreas Neumann
>Assignee: Andreas Neumann
>
> The test sometimes fails as follows:
> {noformat}
> Running org.apache.tephra.hbase.coprocessor.TransactionProcessorTest
> Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.741 sec <<< 
> FAILURE!
> testFamilyDeleteTimestamp(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 1.526 sec
> testTransactionStateCache(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 0.053 sec
> testDataJanitorRegionScanner(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 0.288 sec  <<< FAILURE!
> org.junit.internal.ArrayComparisonFailure: arrays first differed at element 
> [3]; expected:<4> but was:<1>
>   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:50)
>   at org.junit.Assert.internalArrayEquals(Assert.java:473)
>   at org.junit.Assert.assertArrayEquals(Assert.java:294)
>   at org.junit.Assert.assertArrayEquals(Assert.java:305)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:593)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:585)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.testDataJanitorRegionScanner(TransactionProcessorTest.java:190)
> {noformat}
> It is not clear what is causing this, most likely the region server did not 
> have an up-to-date transaction state snapshot at the time of the lfush (that 
> might be due to TEPHRA-239 orTEPHRA-249, or it might be a condition where 
> flush() has no effect because the region is already flushing, 
> Let's observe this and gather more information when/if it happens again. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEPHRA-253) TransactionProcessorTest is sometimes flaky

2017-09-09 Thread Andreas Neumann (JIRA)

[ 
https://issues.apache.org/jira/browse/TEPHRA-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160097#comment-16160097
 ] 

Andreas Neumann commented on TEPHRA-253:


Fix is to wait with the flush until the transaction state is loaded. 
PR: https://github.com/apache/incubator-tephra/pull/54

> TransactionProcessorTest is sometimes flaky
> ---
>
> Key: TEPHRA-253
> URL: https://issues.apache.org/jira/browse/TEPHRA-253
> Project: Tephra
>  Issue Type: Bug
>Affects Versions: 0.12.0-incubating
>Reporter: Andreas Neumann
>Assignee: Andreas Neumann
> Fix For: 0.13.0-incubating
>
>
> The test sometimes fails as follows:
> {noformat}
> Running org.apache.tephra.hbase.coprocessor.TransactionProcessorTest
> Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.741 sec <<< 
> FAILURE!
> testFamilyDeleteTimestamp(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 1.526 sec
> testTransactionStateCache(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 0.053 sec
> testDataJanitorRegionScanner(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 0.288 sec  <<< FAILURE!
> org.junit.internal.ArrayComparisonFailure: arrays first differed at element 
> [3]; expected:<4> but was:<1>
>   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:50)
>   at org.junit.Assert.internalArrayEquals(Assert.java:473)
>   at org.junit.Assert.assertArrayEquals(Assert.java:294)
>   at org.junit.Assert.assertArrayEquals(Assert.java:305)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:593)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:585)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.testDataJanitorRegionScanner(TransactionProcessorTest.java:190)
> {noformat}
> It is not clear what is causing this, most likely the region server did not 
> have an up-to-date transaction state snapshot at the time of the lfush (that 
> might be due to TEPHRA-239 orTEPHRA-249, or it might be a condition where 
> flush() has no effect because the region is already flushing, 
> Let's observe this and gather more information when/if it happens again. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEPHRA-253) TransactionProcessorTest is sometimes flaky

2017-09-09 Thread Andreas Neumann (JIRA)

[ 
https://issues.apache.org/jira/browse/TEPHRA-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160061#comment-16160061
 ] 

Andreas Neumann commented on TEPHRA-253:


Suspicion confirmed. After changing travis to dump the standard output of the 
test case, I see:
{noformat}
2017-09-09 19:18:16,851 - INFO  [main:o.a.h.h.r.RegionCoprocessorHost@196] - 
Load coprocessor org.apache.tephra.hbase.coprocessor.TransactionProcessor from 
HTD of TestRegionScanner successfully.
2017-09-09 19:18:16,868 - INFO  
[StoreOpener-fc704aec719b675f06e5d7bd12da85f0-1:o.a.h.h.r.c.CompactionConfiguration@85]
 - size [134217728, 9223372036854775807); files [3, 10); ratio 1.20; 
off-peak ratio 5.00; throttle point 2684354560; delete expired; major 
period 60480, major jitter 0.50
2017-09-09 19:18:16,883 - INFO  [main:o.a.h.h.r.HRegion@644] - Onlined 
fc704aec719b675f06e5d7bd12da85f0; next sequenceid=1
2017-09-09 19:18:16,883 - INFO  [main:o.a.t.h.c.TransactionProcessorTest@178] - 
Coprocessor is using transaction state: null
2017-09-09 19:18:16,926 - INFO  [main:o.a.t.h.c.TransactionProcessorTest@192] - 
Flushing region 
TestRegionScanner,,1504984696824.fc704aec719b675f06e5d7bd12da85f0.
2017-09-09 19:18:16,960 - INFO  [HDFSTransactionStateStorage 
STARTING:o.a.t.p.HDFSTransactionStateStorage@109] - Using snapshot dir 
/home/travis/build/apache/incubator-tephra/tephra-hbase-compat-0.96/target/junit6493752557205114158/junit8165179254738335598
2017-09-09 19:18:16,981 - INFO  [TransactionStateCache 
STARTING:o.a.t.p.HDFSTransactionStateStorage@185] - Read encoded transaction 
snapshot of 84 bytes
2017-09-09 19:18:16,984 - INFO  [TransactionStateCache 
STARTING:o.a.t.c.TransactionStateCache@166] - Transaction state reloaded with 
snapshot from 1504984695267
2017-09-09 19:18:17,393 - INFO  [main:o.a.h.h.r.DefaultStoreFlusher@88] - 
Flushed, sequenceid=37, memsize=5.9 K, hasBloomFilter=true, into tmp file 
hdfs://localhost:53322/home/travis/build/apache/incubator-tephra/tephra-hbase-compat-0.96/target/junit6493752557205114158/junit7077794411994061305/hbase/data/default/TestRegionScanner/fc704aec719b675f06e5d7bd12da85f0/.tmp/6e813e3b7af94e13afc9dc1303dda3f8
2017-09-09 19:18:17,415 - INFO  [main:o.a.h.h.r.HStore@770] - Added 
hdfs://localhost:53322/home/travis/build/apache/incubator-tephra/tephra-hbase-compat-0.96/target/junit6493752557205114158/junit7077794411994061305/hbase/data/default/TestRegionScanner/fc704aec719b675f06e5d7bd12da85f0/f/6e813e3b7af94e13afc9dc1303dda3f8,
 entries=36, sequenceid=37, filesize=2.2 K
2017-09-09 19:18:17,416 - INFO  [main:o.a.h.h.r.HRegion@1708] - Finished 
memstore flush of ~5.9 K/6048, currentsize=0/0 for region 
TestRegionScanner,,1504984696824.fc704aec719b675f06e5d7bd12da85f0. in 489ms, 
sequenceid=37, compaction requested=false
{noformat}
Clearly, the flush begins before the transaction state is loaded. 

> TransactionProcessorTest is sometimes flaky
> ---
>
> Key: TEPHRA-253
> URL: https://issues.apache.org/jira/browse/TEPHRA-253
> Project: Tephra
>  Issue Type: Bug
>Affects Versions: 0.12.0-incubating
>Reporter: Andreas Neumann
>Assignee: Andreas Neumann
> Fix For: 0.13.0-incubating
>
>
> The test sometimes fails as follows:
> {noformat}
> Running org.apache.tephra.hbase.coprocessor.TransactionProcessorTest
> Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.741 sec <<< 
> FAILURE!
> testFamilyDeleteTimestamp(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 1.526 sec
> testTransactionStateCache(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 0.053 sec
> testDataJanitorRegionScanner(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 0.288 sec  <<< FAILURE!
> org.junit.internal.ArrayComparisonFailure: arrays first differed at element 
> [3]; expected:<4> but was:<1>
>   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:50)
>   at org.junit.Assert.internalArrayEquals(Assert.java:473)
>   at org.junit.Assert.assertArrayEquals(Assert.java:294)
>   at org.junit.Assert.assertArrayEquals(Assert.java:305)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:593)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:585)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.testDataJanitorRegionScanner(TransactionProcessorTest.java:190)
> {noformat}
> It is not clear what is causing this, most likely the region server did not 
> have an up-to-date transaction state snapshot at the time of the lfush (that 
> might be due to TEPHRA-239 orTEPHRA-249, or it might be a condition 

[jira] [Updated] (TEPHRA-253) TransactionProcessorTest is sometimes flaky

2017-09-09 Thread Andreas Neumann (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEPHRA-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Neumann updated TEPHRA-253:
---
Fix Version/s: 0.13.0-incubating

> TransactionProcessorTest is sometimes flaky
> ---
>
> Key: TEPHRA-253
> URL: https://issues.apache.org/jira/browse/TEPHRA-253
> Project: Tephra
>  Issue Type: Bug
>Affects Versions: 0.12.0-incubating
>Reporter: Andreas Neumann
>Assignee: Andreas Neumann
> Fix For: 0.13.0-incubating
>
>
> The test sometimes fails as follows:
> {noformat}
> Running org.apache.tephra.hbase.coprocessor.TransactionProcessorTest
> Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.741 sec <<< 
> FAILURE!
> testFamilyDeleteTimestamp(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 1.526 sec
> testTransactionStateCache(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 0.053 sec
> testDataJanitorRegionScanner(org.apache.tephra.hbase.coprocessor.TransactionProcessorTest)
>   Time elapsed: 0.288 sec  <<< FAILURE!
> org.junit.internal.ArrayComparisonFailure: arrays first differed at element 
> [3]; expected:<4> but was:<1>
>   at 
> org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:50)
>   at org.junit.Assert.internalArrayEquals(Assert.java:473)
>   at org.junit.Assert.assertArrayEquals(Assert.java:294)
>   at org.junit.Assert.assertArrayEquals(Assert.java:305)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:593)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.assertKeyValueMatches(TransactionProcessorTest.java:585)
>   at 
> org.apache.tephra.hbase.coprocessor.TransactionProcessorTest.testDataJanitorRegionScanner(TransactionProcessorTest.java:190)
> {noformat}
> It is not clear what is causing this, most likely the region server did not 
> have an up-to-date transaction state snapshot at the time of the lfush (that 
> might be due to TEPHRA-239 orTEPHRA-249, or it might be a condition where 
> flush() has no effect because the region is already flushing, 
> Let's observe this and gather more information when/if it happens again. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEPHRA-241) Introduce a way to limit the size of a transaction

2017-09-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TEPHRA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159773#comment-16159773
 ] 

ASF GitHub Bot commented on TEPHRA-241:
---

Github user anew commented on the issue:

https://github.com/apache/incubator-tephra/pull/48
  
After rebasing on latest master, another travis build is running; waiting 
for that to finish.


> Introduce a way to limit the size of a transaction
> --
>
> Key: TEPHRA-241
> URL: https://issues.apache.org/jira/browse/TEPHRA-241
> Project: Tephra
>  Issue Type: Improvement
>  Components: api, manager
>Affects Versions: 0.12.0-incubating
>Reporter: Andreas Neumann
>Assignee: Andreas Neumann
> Fix For: 0.13.0-incubating
>
>
> When clients perform a huge number of writes in a short transaction, that can 
> result in huge change sets. For example, if a client performs 10M writes and 
> sends that change set over, that can easily be 1GB large. The transaction 
> manager will keep this in memory. It will also write this as an edit to the 
> transaction log.
> Assume it runs out of memory because the change set is too large. It crashes 
> and when it restarts, it will replay the log, load that huge change set 
> again, and crash again. 
> To prevent this kind of systemic failure, and to encourage developers to use 
> long transactions when performing many writes, we can introduce two new 
> properties in the configuration:
> - change set warn threshold: if a change set exceeds this size, a warning is 
> logged. 
> - change set reject threshold: if a change set exceeds this size, it is 
> rejected (canCommit will throw an exception) and that will fail the 
> transaction.
> Both thresholds should be Long.MAX_VALUE by default, to preserve existing 
> behavior after upgrade. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] incubator-tephra pull request #54: wip

2017-09-09 Thread anew
GitHub user anew opened a pull request:

https://github.com/apache/incubator-tephra/pull/54

wip

on hold

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/anew/incubator-tephra tephra-253

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-tephra/pull/54.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #54


commit 140c1f0aa621fb7ab7a80c087b5be293d3b68035
Author: anew 
Date:   2017-09-09T06:43:55Z

wip




---


[jira] [Commented] (TEPHRA-241) Introduce a way to limit the size of a transaction

2017-09-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TEPHRA-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159796#comment-16159796
 ] 

ASF GitHub Bot commented on TEPHRA-241:
---

Github user anew commented on the issue:

https://github.com/apache/incubator-tephra/pull/48
  
Same travis failure. This appears to happen a lot more frequently with Java 
8. I will commit this now and try to fix the flaky test before 0.13 release.


> Introduce a way to limit the size of a transaction
> --
>
> Key: TEPHRA-241
> URL: https://issues.apache.org/jira/browse/TEPHRA-241
> Project: Tephra
>  Issue Type: Improvement
>  Components: api, manager
>Affects Versions: 0.12.0-incubating
>Reporter: Andreas Neumann
>Assignee: Andreas Neumann
> Fix For: 0.13.0-incubating
>
>
> When clients perform a huge number of writes in a short transaction, that can 
> result in huge change sets. For example, if a client performs 10M writes and 
> sends that change set over, that can easily be 1GB large. The transaction 
> manager will keep this in memory. It will also write this as an edit to the 
> transaction log.
> Assume it runs out of memory because the change set is too large. It crashes 
> and when it restarts, it will replay the log, load that huge change set 
> again, and crash again. 
> To prevent this kind of systemic failure, and to encourage developers to use 
> long transactions when performing many writes, we can introduce two new 
> properties in the configuration:
> - change set warn threshold: if a change set exceeds this size, a warning is 
> logged. 
> - change set reject threshold: if a change set exceeds this size, it is 
> rejected (canCommit will throw an exception) and that will fail the 
> transaction.
> Both thresholds should be Long.MAX_VALUE by default, to preserve existing 
> behavior after upgrade. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] incubator-tephra issue #48: [TEPHRA-241] Introduce a way to limit the size o...

2017-09-09 Thread anew
Github user anew commented on the issue:

https://github.com/apache/incubator-tephra/pull/48
  
Same travis failure. This appears to happen a lot more frequently with Java 
8. I will commit this now and try to fix the flaky test before 0.13 release.


---