[jira] [Commented] (FLINK-3948) EventTimeWindowCheckpointingITCase Fails with Core Dump

2016-06-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316023#comment-15316023
 ] 

ASF GitHub Bot commented on FLINK-3948:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2072
  
Looks great, +1 to merge


> EventTimeWindowCheckpointingITCase Fails with Core Dump
> ---
>
> Key: FLINK-3948
> URL: https://issues.apache.org/jira/browse/FLINK-3948
> Project: Flink
>  Issue Type: Bug
>  Components: state backends
>Affects Versions: 1.1.0
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Critical
>
> It fails because of a core dump in RocksDB. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3948) EventTimeWindowCheckpointingITCase Fails with Core Dump

2016-06-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15315477#comment-15315477
 ] 

ASF GitHub Bot commented on FLINK-3948:
---

GitHub user aljoscha opened a pull request:

https://github.com/apache/flink/pull/2072

[FLINK-3948] Protect RocksDB cleanup by cleanup lock

Before, it could happen that an asynchronous checkpoint was going on
when trying to do cleanup. Now we protect cleanup and asynchronous
checkpointing by a lock.

This was what caused `EventTimeWindowCheckpointingITCase` to fail. I now 
ran it more than a 100 times on travis and haven't observed a build failure 
related to this.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aljoscha/flink rocksdb/fix-core-dump

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2072.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2072


commit c8456b45c47e67cc316d5bb979de36a6225eebd4
Author: Aljoscha Krettek 
Date:   2016-06-04T05:59:48Z

Revert "[FLINK-3960] ignore EventTimeWindowCheckpointingITCase for now"

This reverts commit 98a939552e12fc699ff39111bbe877e112460ceb.

commit 13c8593ec9074aa086caf4329b21e331a1c54d58
Author: Aljoscha Krettek 
Date:   2016-05-20T20:37:14Z

[FLINK-3948] Protect RocksDB cleanup by cleanup lock

Before, it could happen that an asynchronous checkpoint was going on
when trying to do cleanup. Now we protect cleanup and asynchronous
checkpointing by a lock.




> EventTimeWindowCheckpointingITCase Fails with Core Dump
> ---
>
> Key: FLINK-3948
> URL: https://issues.apache.org/jira/browse/FLINK-3948
> Project: Flink
>  Issue Type: Bug
>  Components: state backends
>Affects Versions: 1.1.0
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Critical
>
> It fails because of a core dump in RocksDB. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3948) EventTimeWindowCheckpointingITCase Fails with Core Dump

2016-05-21 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294781#comment-15294781
 ] 

Aljoscha Krettek commented on FLINK-3948:
-

RocksDB seems to be somewhat sensitive to the environment and configuration. I 
changed the configuration to this:

{code}
private static class RocksDbOptionsFactory implements OptionsFactory {

final long targetFileSize = 100;
final long writeBufferSize = 100;

@Override
public DBOptions createDBOptions(DBOptions currentOptions) {
currentOptions
.setMaxBackgroundCompactions(1)
.setMaxBackgroundFlushes(1)
.setMaxOpenFiles(1);
return currentOptions;
}

@Override
public ColumnFamilyOptions createColumnOptions(ColumnFamilyOptions 
currentOptions) {
currentOptions
.setTargetFileSizeBase(targetFileSize)
.setMaxBytesForLevelBase(4 * targetFileSize)
.setWriteBufferSize(writeBufferSize)
.setMinWriteBufferNumberToMerge(1)
.setMaxWriteBufferNumber(1);
return currentOptions;
}
}
{code}

And now I even get this on my local machine:
{code}
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0001288fb173, pid=53485, tid=62699
#
# JRE version: Java(TM) SE Runtime Environment (8.0_40-b25) (build 1.8.0_40-b25)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.40-b25 mixed mode bsd-amd64 
compressed oops)
# Problematic frame:
# C  [librocksdbjni2649341092967859180..jnilib+0xc0173]  
rocksdb::TableCache::FindTable(rocksdb::EnvOptions const&, 
rocksdb::InternalKeyComparator const&, rocksdb::FileDescriptor const&, 
rocksdb::Cache::Handle**, bool, bool, rocksdb::HistogramImpl*)+0x93
#
# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/aljoscha/Dev/work/flink/flink-tests/hs_err_pid53485.log
[thread 25603 also had an error]
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
{code}

> EventTimeWindowCheckpointingITCase Fails with Core Dump
> ---
>
> Key: FLINK-3948
> URL: https://issues.apache.org/jira/browse/FLINK-3948
> Project: Flink
>  Issue Type: Bug
>  Components: state backends
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Critical
>
> It fails because of a core dump in RocksDB. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)