Re: RocksDB error with flink 1.2.0

2017-05-05 Thread mclendenin
I ended up combining all the patterns into one giant CEP pattern and then filtering the output of that pattern instead. That way it was only one RocksDB instance which led to large checkpoints instead of lots of small checkpoints. This seems to work, but I still need to do more testing around

Re: RocksDB error with flink 1.2.0

2017-05-03 Thread Stephan Ewen
Multiplexing patterns seems like the right thing to do. Aside from not sharing rocksdb, having 300 separate operators also results in more threads, network connections, etc. That makes it all less efficient... On Tue, May 2, 2017 at 6:06 PM, Aljoscha Krettek wrote: > They

Re: RocksDB error with flink 1.2.0

2017-05-02 Thread Aljoscha Krettek
They can’t (with the current design of Flink) because each CEP pattern get’s executed by a separate operator. We could think about doing multiplexing of several patterns inside one operator. It’s what I hinted at earlier as a possible solution when I mentioned that you could implement your own

Re: RocksDB error with flink 1.2.0

2017-05-02 Thread Elias Levy
Any reason they can't share a single RocksDB state backend instance? On Fri, Apr 28, 2017 at 8:44 AM, Aljoscha Krettek wrote: > The problem here is that this will try to open 300 RocksDB instances on > each of the TMs (depending on how the parallelism is spread between the

Re: RocksDB error with flink 1.2.0

2017-05-02 Thread Aljoscha Krettek
Hi, I think there the bottleneck might be HDFS. With 300 operators with parallelism 6 you will have 1800 concurrent writes (i.e. connections) to HDFS, which might be to much for the master node and the worker nodes. This is the same problem that you had on the local filesystem but now in the

Re: RocksDB error with flink 1.2.0

2017-04-28 Thread mclendenin
There are only 3 nodes in the HDFS cluster and when running fsck it shows the filesystem as healthy. $ hdfs fsck /user/hadoop/flink/checkpoints/dc2aee563bebce76e420029525c37892/chk-43/ 17/04/28 16:24:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using

Re: RocksDB error with flink 1.2.0

2017-04-28 Thread mclendenin
The top level exception is similar to one on this Jira issue but the root Exception is different. This one says it was fixed in 1.2.0 which is what I'm using https://issues.apache.org/jira/browse/FLINK-5663 -- View this message in context:

Re: RocksDB error with flink 1.2.0

2017-04-28 Thread mclendenin
This is the stacktrace I'm getting when checkpointing to the HDFS. It happens like once every 3 checkpoints and I don't see this without parallelism. AsynchronousException{java.lang.Exception: Could not materialize checkpoint 6 for operator KeyedCEPPatternOperator -> Flat Map -> Map -> Sink:

Re: RocksDB error with flink 1.2.0

2017-04-28 Thread Marcus Clendenin
I changed the max number of open files and got past this error but now I'm seeing errors that it's unable to flush the file. I am checkpointing using hdfs, should I be using local file system? Is there any better way to use the cep patterns with multiple patterns or are you suggesting creating my

Re: RocksDB error with flink 1.2.0

2017-04-28 Thread Aljoscha Krettek
The problem here is that this will try to open 300 RocksDB instances on each of the TMs (depending on how the parallelism is spread between the machines this could be more or less). As the exception says, this will open too many files because each RocksDB instance has a directory with several

RocksDB error with flink 1.2.0

2017-04-28 Thread mclendenin
Starting ~300 CEP patterns with parallelism of 6 since there are 6 partitions on a kafka topic. Checkpoint using rocksDB to Hadoop on interval of 50 seconds. Cluster is HA with 2 JM and 5 TM. Getting following exception : java.io.IOException: Error creating ColumnFamilyHandle. at