Kerberos for Streaming & Kafka

2016-03-24 Thread Eron Wright
Hi, Given the other thread about per-job Kerberos identity, now's a good time to discuss some problems with the current delegation-token approach, since the answer could bear on the per-job enhancement. Two problems:Delegation tokens expire. For a continuous streaming job to survive, the

Re: RichMapPartitionFunction - problems with collect

2016-03-24 Thread Chesnay Schepler
Haven't looked to deeply into this, but this sounds like object reuse is enabled, at which point buffering values effectively causes you to store the same value multiple times. can you try disabling objectReuse using env.getConfig().disableObjectReuse() ? On 22.03.2016 16:53, Sergio Ramírez

Re: Native iterations in PyFlink

2016-03-24 Thread Chesnay Schepler
Hello Shannon, you've picked yourself quite a feature there. The following classes will be relevant: * Python o DataSet o OperationInfo o Environment (_send_operation method) o Constants._Identifier * Java o PythonPlanBinder o PythonOperationInfo An

Native iterations in PyFlink

2016-03-24 Thread Shannon Quinn
Hi all, I'm looking at Flink for highly iterative ALS-like distributed computations, and the concept of native iteration support was very attractive. However, I notice that the Python API is missing this item. I'd absolutely be interested in adding that component if someone could point me in

Re: RollingSink

2016-03-24 Thread Aljoscha Krettek
Hi, I’m not aware of anyone having tested the RollingSink with anything besides “hdfs://“ and “file://“. That the file is empty is strange. Is something like revokeLease() necessary for your custom HCFS? Cheers, Aljoscha On Wed, 23 Mar 2016 at 17:53 Vijay Srinivasaraghavan

Re: RollingSink

2016-03-24 Thread Aljoscha Krettek
Hi, (sending from my other handle since the apache mail relay seems to be down for me) I’m not aware of anyone having tested the RollingSink with anything besides “hdfs://“ and “file://“. That the file is empty is strange. Is something like revokeLease() necessary for your custom HCFS?

Re: Streaming KV store abstraction

2016-03-24 Thread Nam-Luc Tran
>Sorry for the late answer, I completely missed this email. (Thanks Robert for pointing out). No problem ;) >Now that you have everything set up, in flatMap1 (for events) you would query the state : state.value() and enrich your data >in flatMap2 you would update the state: state.update(newState)

[jira] [Created] (FLINK-3667) Generalize client<->cluster communication

2016-03-24 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-3667: - Summary: Generalize client<->cluster communication Key: FLINK-3667 URL: https://issues.apache.org/jira/browse/FLINK-3667 Project: Flink Issue

[jira] [Created] (FLINK-3666) Remove Nephele references

2016-03-24 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-3666: --- Summary: Remove Nephele references Key: FLINK-3666 URL: https://issues.apache.org/jira/browse/FLINK-3666 Project: Flink Issue Type: Improvement

[jira] [Created] (FLINK-3665) Range partitioning lacks support to define sort orders

2016-03-24 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-3665: Summary: Range partitioning lacks support to define sort orders Key: FLINK-3665 URL: https://issues.apache.org/jira/browse/FLINK-3665 Project: Flink Issue

Apache Flink: aligning watermark among parallel tasks

2016-03-24 Thread Ozan DENİZ
We are using periodic event time window with watermark. We have currently 4 parallel tasks in our Flink App. During the streaming process, all the 4 tasks' watermark values must be close to trigger window event. For example; Task 1 watermark value = 8 Task 2 watermark value =

Re: Proposal: YARN session per-job Kerberos authentication

2016-03-24 Thread Robert Metzger
Hi Stefano, I think the proposed feature is not limited to YARN sessions. With the code in place, also standalone clusters would allow us to authenticate file system access with the user who submitted the job. I would recommend you to do some prototyping and come up with a design document first.