Re: [VOTE] Release 1.4.1, release candidate #1

2018-02-14 Thread Tzu-Li (Gordon) Tai
+1 (binding)

- Built from source, tests pass.
- Tested the now shaded Elasticsearch Connector on a cluster execution. Logs 
are correctly forwarded to TM logs. Checked that all dependencies are correctly 
shaded.
- Successfully restored a window operator job’s savepoint from 1.3.2 in 1.4.1.
- Kerberos on YARN using keytabs working without issues
- List of staged sources / binaries look good. No staged Maven artifacts is 
missing.

Regarding the Network Buffer changes:
I ran the usual chaos monkey test we do for major releases on AWS, but modified 
so that the job fails randomly on some message (instead of shooting down TMs).
Everything seems to be working without issues, so I assume we’re good to go on 
this front.

Cheers,
Gordon
On 15 February 2018 at 12:57:14 AM, Timo Walther (twal...@apache.org) wrote:

+1 (binding)  

- I scanned the changes  
- Run some example table programs  

Looks good from my side.  

Regards,  
Timo  


Am 2/14/18 um 5:46 PM schrieb Aljoscha Krettek:  
> +1 (binding)  
>  
> - I checked the signatures and hashes  
> - I ran a cluster and tried some example programs  
> - I checked the list of all changes: we're good, legally, and the other 
> changes also all look good  
>  
> The only thing I'm not sure about is the Network Buffer changes but I don't 
> think they make things worse.  
>  
> Best,  
> Aljoscha  
>  
>> On 13. Feb 2018, at 19:19, Fabian Hueske  wrote:  
>>  
>> Hi Alex,  
>>  
>> FLINK-6352 seems to be an improvement / new feature.  
>> These are included in minor releases like 1.5.0 but not in bug fix releases  
>> such as 1.4.1.  
>>  
>> Best, Fabian  
>>  
>> 2018-02-13 19:08 GMT+01:00 Alexandru Gutan :  
>>  
>>> Hi everyone,  
>>>  
>>> Is [FLINK-6352] [kafka] Timestamp-based offset configuration for  
>>> FlinkKafkaConsumer  going to  
>>> make it into 1.4.1? Or it's left for 1.5?  
>>>  
>>> Best,  
>>> Alex.  
>>>  
>>> On 13 February 2018 at 19:05, Fabian Hueske  wrote:  
>>>  
 Hi,  
  
 I check the diff between Flink 1.4.0 and Flink 1.4.1-RC1 for  
>>> added/changed  
 dependencies.  
 I only found a version update of the Snappy dependency. Both, the  
>>> previous  
 and the new version, are ASL so this is fine.  
  
 The signatures and hashes are OK.  
 I also built the binary release and found no issues, i.e., mvn  
>>> -DskipTests  
 clean install finished successfully.  
  
 +1 (binding) to release RC1  
  
 @Gordon, thanks for taking care of the release!  
  
 Best, Fabian  
  
 2018-02-10 15:21 GMT+01:00 Tzu-Li (Gordon) Tai :  
  
> Hi everyone,  
>  
> Please review and vote on release candidate #1 for Flink 1.4.1, as  
> follows:  
> [ ] +1, Approve the release  
> [ ] -1, Do not approve the release (please provide specific comments)  
>  
> The complete staging area is available for your review, which includes:  
> * JIRA release notes [1],  
> * the official Apache source release and binary convenience releases to  
 be  
> deployed to dist.apache.org [2], which are signed with the key with  
> fingerprint 1C1E2394D3194E1944613488F320986D35C33D6A [3],  
> * all artifacts to be deployed to the Maven Central Repository [4],  
> * source code tag “release-1.4.1-rc1” [5],  
> * website pull request listing the new release [6].  
> * A complete list of all new commits in release-1.4.1-rc1, since  
> release-1.4.0 [7]  
>  
> The vote will be open for at least 72 hours.  
> Please test the release and vote for the release candidate before  
> Wednesday (February 14th), 7pm CET.  
> It is adopted by majority approval, with at least 3 PMC affirmative  
 votes.  
> Thanks,  
> Gordon  
>  
> [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?  
> projectId=12315522=12342212  
> [2] http://people.apache.org/~tzulitai/flink-1.4.1-rc1/  
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS  
> [4] https://repository.apache.org/content/repositories/  
 orgapacheflink-1145  
> [5] https://git-wip-us.apache.org/repos/asf?p=flink.git;a=commit;h=  
> dbab73e5d097518838283a7a4f056cca0ac3f9a5  
> [6] https://github.com/apache/flink-web/pull/101  
>  
> [7]  
> * dbab73e5d0 - Commit for release 1.4.1 (17 hours ago)  
> * beff62d2e6 - [FLINK-8571] [DataStream] Introduce utility function  
>>> that  
> reinterprets a data stream as keyed stream (backport from 1.5 branch)  
>>> (17  
> hours ago)  
> * 33ebc85c28 - [FLINK-8362] [elasticsearch] Further improvements for  
> Elasticsearch connector shading (2 days ago)  
> * 0c53e798cb - [FLINK-8362][elasticsearch] shade all dependencies (2  
>>> days  
> ago)  
> * 5f9e367be6 - [FLINK-7760] Fix deserialization of 

[jira] [Created] (FLINK-8660) Enable the user to provide custom HAServices implementation

2018-02-14 Thread JIRA
Krzysztof Białek created FLINK-8660:
---

 Summary: Enable the user to provide custom HAServices 
implementation 
 Key: FLINK-8660
 URL: https://issues.apache.org/jira/browse/FLINK-8660
 Project: Flink
  Issue Type: Improvement
  Components: Cluster Management, Configuration, Distributed 
Coordination
Affects Versions: 1.4.0
Reporter: Krzysztof Białek
 Fix For: 1.5.0, 1.4.2


At the moment Flink uses ZooKeeper as HA backend.

The goal of this improvement is to make Flink supporting more HA backends, also 
maintained as independent projects.

The following changes are required to achieve it:
 # Add {{HighAvailabilityServicesFactory}} interface
 # Add new option {{HighAvailabilityMode.CUSTOM}}
 # Add new configuration property {{high-availability.factoryClass}}
 # Use the factory in {{HighAvailabilityServicesUtils}} to instantiate  
{{HighAvailabilityServices}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: IO metrics

2018-02-14 Thread cw7k
 To use the "numBytesOutPerSecond" example, what exactly is being measured?  Is 
there an example app with usage of this metric?

Just to clarify, if I register this meter in BucketingSink, it will be ignored? 
 Does this mean I need to implement my own measurement mechanism in 
BucketingSink and set up a new meter for throughput?  Sample use case is a sink 
to the local filesystem and to cloud storage.
On Wednesday, February 14, 2018, 1:10:47 AM PST, Chesnay Schepler 
 wrote:  
 
 All metrics listed are automatically measured, you only need to 
configure the Prometheus reporter.

Note that we do not measure how much data a source is reading / a sink 
is writing (see FLINK-7286 
).
If you want to measure these you will have to modify the source code of 
the respective source/sink classes,
like the BucketingSink. Do make sure that your custom metrics have 
different names than the built-in ones
as otherwise the will be ignored.

On 13.02.2018 20:37, cw7k wrote:
> Hi, couple questions on IO metrics listed 
> here:https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/metrics.html#io
>
> We're trying to get metrics such as throughput to filesystem sinks.  Are the 
> metrics listed on that page automatically recorded, and we just need to 
> retrieve them?  If so, would BucketingSink be the place to add metrics to be 
> visible in Prometheus?


  

[jira] [Created] (FLINK-8659) Add migration tests for Broadcast state.

2018-02-14 Thread Kostas Kloudas (JIRA)
Kostas Kloudas created FLINK-8659:
-

 Summary: Add migration tests for Broadcast state.
 Key: FLINK-8659
 URL: https://issues.apache.org/jira/browse/FLINK-8659
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.5.0
Reporter: Kostas Kloudas
Assignee: Kostas Kloudas
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8657) Fix incorrect description for external checkpoint vs savepoint

2018-02-14 Thread Sihua Zhou (JIRA)
Sihua Zhou created FLINK-8657:
-

 Summary: Fix incorrect description for external checkpoint vs 
savepoint
 Key: FLINK-8657
 URL: https://issues.apache.org/jira/browse/FLINK-8657
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.4.0
Reporter: Sihua Zhou
Assignee: Sihua Zhou
 Fix For: 1.5.0


I checked that external checkpoint also supported rescale both in code and 
practice. But in the doc it still note that "do not support Flink specific 
features like rescaling." 

I am afraid whether I have missed something, if so please just close this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Apache EU Roadshow CFP Closing Soon (23 February)

2018-02-14 Thread Sharan F

Hello Everyone

This is an initial reminder to let you all know that we are holding an 
Apache EU Roadshow co-located with FOSS Backstage in Berlin on 13^th and 
14^th June 2018. https://s.apache.org/tCHx


The Call for Proposals (CFP) for the Apache EU Roadshow is currently 
open and will close at the end of next week, so if you have been 
delaying making a submission because the closing date seemed a long way 
off, then it's time to start getting your proposals submitted.


So what are we looking for?
We will have 2 Apache Devrooms available during the 2 day Roadshow so 
are looking for projects including incubating ones, to submit 
presentations, panel discussions, BoFs, or workshop proposals. The main 
focus of the Roadshow will be IoT, Cloud, Httpd and Tomcat so if your 
project is involved in or around any of these technologies at Apache 
then we are very interested in hearing from you.


Community and collaboration is important at Apache so if your project is 
interested in organising a project sprint, meetup or hackathon during 
the Roadshow, then please submit it inthe CFP as we do have some space 
available to allocate for these.


If you are wanting to submit a talk on open source community related 
topics such as the Apache Way, governance or legal aspects then please 
submit these to the CFP for FOSS Backstage.


Tickets for the Apache EU Roadshow are included as part of the 
registration for FOSS Backstage, so to attend the Roadshow you will need 
to register for FOSS Backstage. Early Bird tickets are still available 
until the 21^st February 2018.


Please see below for important URLs to remember:

-  To submit a CFP for the Apache EU Roadshow 
:http://apachecon.com/euroadshow18/ 


-  To submit a CFP for FOSS Backstage : 
https://foss-backstage.de/call-papers


-  To register to attend the Apache EU Roadshow and/or FOSS Backstage : 
https://foss-backstage.de/tickets


For further updates and information about the Apache EU Roadshowplease 
check http://apachecon.com/euroadshow18/


Thanks
Sharan Foga, VP Apache Community Development


[jira] [Created] (FLINK-8656) Add CLI command for rescaling

2018-02-14 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-8656:


 Summary: Add CLI command for rescaling
 Key: FLINK-8656
 URL: https://issues.apache.org/jira/browse/FLINK-8656
 Project: Flink
  Issue Type: New Feature
  Components: Client
Affects Versions: 1.5.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 1.5.0


The REST rescaling calls should be made accessible via the {{CliFrontend}}. In 
order to do that I propose to add a {{modify}} command to the {{CliFrontend}} 
to which we can pass a new parallelism.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8655) Add a default keyspace to CassandraSink

2018-02-14 Thread Christopher Hughes (JIRA)
Christopher Hughes created FLINK-8655:
-

 Summary: Add a default keyspace to CassandraSink
 Key: FLINK-8655
 URL: https://issues.apache.org/jira/browse/FLINK-8655
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.4.0
Reporter: Christopher Hughes
 Fix For: 1.4.1


Currently, to use the CassandraPojoSink, it is necessary for a user to provide 
keyspace information on the desired POJOs using datastax annotations.  This 
allows various POJOs to be written to multiple keyspaces while sinking 
messages, but prevent runtime flexibility.

For many developers, non-production environments may all share a single 
Cassandra instance differentiated by keyspace names.  I propose adding a 
`defaultKeyspace(String keyspace)` to the ClusterBuilder.  POJOs that lack a 
keyspace would be piped to the default. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8654) Extend quickstart docs on how to submit jobs

2018-02-14 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-8654:
---

 Summary: Extend quickstart docs on how to submit jobs
 Key: FLINK-8654
 URL: https://issues.apache.org/jira/browse/FLINK-8654
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, Quickstarts
Affects Versions: 1.5.0
Reporter: Chesnay Schepler


The quickstart documentation explains how to setup the project, build the jar 
and run things in the IDE, but neither explains how to submit the jar to a 
cluster nor guides the user to where he could find this information (like the 
CLI docs).

Additionally, the quickstart poms should also contain the commands for 
submitting the jar to a cluster, in particular how to select a main-class if it 
wasn't set in the pom. (-c CLI flag)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8653) Remove slot request timeout from SlotPool

2018-02-14 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-8653:


 Summary: Remove slot request timeout from SlotPool
 Key: FLINK-8653
 URL: https://issues.apache.org/jira/browse/FLINK-8653
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Coordination
Affects Versions: 1.5.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 1.5.0


After addressing FLINK-8643, we can further simplify the {{SlotPool}} by 
replacing the internal slot request timeout by the timeout given to 
{{SlotPool#allocateSlot}}. Since this request will timeout on the 
{{ProviderAndOwner}} side anyway, we should do the same on the {{SlotPool}} 
side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8652) Reduce log level of QueryableStateClient.getKvState() to DEBUG

2018-02-14 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8652:


 Summary: Reduce log level of QueryableStateClient.getKvState() to 
DEBUG
 Key: FLINK-8652
 URL: https://issues.apache.org/jira/browse/FLINK-8652
 Project: Flink
  Issue Type: Improvement
  Components: Queryable State
Affects Versions: 1.5.0
Reporter: Fabian Hueske
 Fix For: 1.5.0


The {{QueryableStateClient}} logs each state request on {{INFO}} log level. 
This results in very verbose logging in most applications that use queryable 
state.

I propose to reduce the log level to {{DEBUG}}.

What do you think [~kkl0u], [~aljoscha]?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Why are checkpoint failures so serious?

2018-02-14 Thread Aljoscha Krettek
Hi Ron,

Keep in mind, though, that this feature will only be available with the 
upcoming Flink 1.5. Just making sure you don't go looking for this and are 
surprised if you don't find it.

Best,
Aljoscha


> On 14. Feb 2018, at 10:20, Till Rohrmann  wrote:
> 
> Hi Ron,
> 
> you should be able to turn off the Task failure in case of a checkpoint
> failure by setting `ExecutionConfig.setFailTaskOnCheckpointError(false)`.
> This setting should change the behavior such that checkpoint failures will
> simply fail the distributed checkpoint.
> 
> Cheers,
> Till
> 
> On Tue, Feb 13, 2018 at 11:41 PM, Ron Crocker  wrote:
> 
>> What would it take to be a little more flexible in handling checkpoint
>> failures?
>> 
>> Right now I have a team that’s checkpointing into S3, via the
>> FsStateBackend and an appropriate URL. Sometimes these checkpoints fail.
>> They’re transient, though, and a retry would likely work.
>> 
>> However, when they fail, their job exits and restarts from the last
>> checkpoint. That’s fine, but I’d rather it tried again before failing, and
>> even after failing just keep running and do another checkpoint. Maybe this
>> is something that should be configurable - # of retries, failure strategy, …
>> 
>> Ron



Re: Support distinct aggregation over data stream on Table/SQL API

2018-02-14 Thread Fabian Hueske
Hi Rong,

Thanks for taking the initiative to improve the support for DISTINCT
aggregations!
I've made a pass over your design document and left a couple of comments.
I think it is a really good write up and serves as a good start.

IMO, the next steps could be to
1) continue and finalize the discussion on the design doc. Feel free to
open a new umbrella JIRA and link your doc there.
2) check which JIRAs are still relevant. Close or reorganize them according
to the plan in your design doc and make them subissues of the umbrella
issue.
3) add support for DISTINCT in SQL
4) later add extend the Table API to also support distinct aggregations
(this would be mostly API changes since the execution is solved before)

Let me know what you think.

Best, Fabian


2018-02-14 3:07 GMT+01:00 Rong Rong :

> Hi Community,
>
> We are working on support of distinct aggregators over data stream on
> Table/SQL API. Currently there are seems to be many JIRAs related to
> distinct agg over stream use cases which are still pending (FLINK-6249
> , FLINK-6260
> , FLINK-5315
> , FLINK-6335
> , FLINK-6373
> , FLINK-6250
> , etc) and I am having
> some concerns when trying to come up with a solution as there might be
> other use cases out there.
>
> I summarized a write up and categorized the use cases into unbounded or
> bounded aggregations and proposed a solution through modifying and adding
> new distinct aggregate functions using UDAGG API with DataView. Please find
> it here
>  xTQB-mVmYfm6LsN2_NHgTCVmJI/edit?usp=sharing>
> .
>
> Any comments or suggestions are highly appreciated.
>
> Many Thanks,
> Rong
>


Re: Why are checkpoint failures so serious?

2018-02-14 Thread Till Rohrmann
Hi Ron,

you should be able to turn off the Task failure in case of a checkpoint
failure by setting `ExecutionConfig.setFailTaskOnCheckpointError(false)`.
This setting should change the behavior such that checkpoint failures will
simply fail the distributed checkpoint.

Cheers,
Till

On Tue, Feb 13, 2018 at 11:41 PM, Ron Crocker  wrote:

> What would it take to be a little more flexible in handling checkpoint
> failures?
>
> Right now I have a team that’s checkpointing into S3, via the
> FsStateBackend and an appropriate URL. Sometimes these checkpoints fail.
> They’re transient, though, and a retry would likely work.
>
> However, when they fail, their job exits and restarts from the last
> checkpoint. That’s fine, but I’d rather it tried again before failing, and
> even after failing just keep running and do another checkpoint. Maybe this
> is something that should be configurable - # of retries, failure strategy, …
>
> Ron


[jira] [Created] (FLINK-8651) Add support for different event-time OVER windows in a query

2018-02-14 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-8651:


 Summary: Add support for different event-time OVER windows in a 
query
 Key: FLINK-8651
 URL: https://issues.apache.org/jira/browse/FLINK-8651
 Project: Flink
  Issue Type: New Feature
  Components: Table API  SQL
Affects Versions: 1.5.0
Reporter: Fabian Hueske


Right now, Table API and SQL queries only support multiple OVER window 
aggregations, but all OVER windows must be of the same type.

For example the following query is currently supported:
{code:java}
SELECT c, b, 
  COUNT(a) OVER (PARTITION BY c ORDER BY rowtime RANGE BETWEEN INTERVAL '1' 
SECOND PRECEDING AND CURRENT ROW),
  SUM(a) OVER (PARTITION BY c ORDER BY rowtime RANGE BETWEEN INTERVAL '1' 
SECOND PRECEDING AND CURRENT ROW)
FROM T1
{code}
If we would change the interval or partitioning attribute of the {{SUM(a)}} 
aggregation's window, the query could not be executed.

We can add support for multiple different windows by splitting the query and 
joining it back.
 This would require an optimizer rule, that rewrites a plan from
{code:java}
IN -> OverAgg(window-A, window-B) -> OUT
{code}
to
{code:java}
 /-OverAgg(window-A)-\
IN -> Calc(uniq-id)-< >-WindowJoin(uniq-id, rowtime) -> OUT 
   
 \-OverAgg(window-B)-/
{code}

The unique id should consist of three components: the timestamp, the parallel 
index of the function instance, and a counter that just wraps around. One of 
the aggregates can be projected to only the required fields and the window join 
would join on uniq-id and timestamp equality (when we support FOLLOWING 
boundaries, we would have to join on a time range).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: IO metrics

2018-02-14 Thread Chesnay Schepler
All metrics listed are automatically measured, you only need to 
configure the Prometheus reporter.


Note that we do not measure how much data a source is reading / a sink 
is writing (see FLINK-7286 
).
If you want to measure these you will have to modify the source code of 
the respective source/sink classes,
like the BucketingSink. Do make sure that your custom metrics have 
different names than the built-in ones

as otherwise the will be ignored.

On 13.02.2018 20:37, cw7k wrote:

Hi, couple questions on IO metrics listed 
here:https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/metrics.html#io

We're trying to get metrics such as throughput to filesystem sinks.  Are the 
metrics listed on that page automatically recorded, and we just need to 
retrieve them?  If so, would BucketingSink be the place to add metrics to be 
visible in Prometheus?





Re: IO metrics

2018-02-14 Thread Till Rohrmann
Hi,

the metrics listed on the web page are registered for all tasks
automatically. Thus, you should be able to simply consume them by
configuring the respective Prometheus metric reporter. One thing to note is
that the metrics are reported per Task and not per logical operator. Thus,
if your source is chained with another upstream operator, then you can
retrieve the metrics for the chain of these operators.

Cheers,
Till

On Tue, Feb 13, 2018 at 8:37 PM, cw7k  wrote:

> Hi, couple questions on IO metrics listed here:https://ci.apache.org/
> projects/flink/flink-docs-release-1.4/monitoring/metrics.html#io
>
> We're trying to get metrics such as throughput to filesystem sinks.  Are
> the metrics listed on that page automatically recorded, and we just need to
> retrieve them?  If so, would BucketingSink be the place to add metrics to
> be visible in Prometheus?


Re: Batch job getting stuck

2018-02-14 Thread Amit Jain
Hi Timo,

Yes, we are using off-heap memory, our yarn container are set to use ~23G
memory with two slot per container and set yarn heap cutoff ratio to 0.6.

Jobs are having normal memory usage, problem here is not temporary halt but
permanent halt for the running jobs.

Task manager's log

2018-02-08 16:55:31,007 INFO
org.apache.flink.yarn.YarnTaskManagerRunner   -  JVM
Options:
2018-02-08 16:55:31,007 INFO
org.apache.flink.yarn.YarnTaskManagerRunner   -
-Xms9370m
2018-02-08 16:55:31,007 INFO
org.apache.flink.yarn.YarnTaskManagerRunner   -
-Xmx9370m


GC run and memory usage on one of used task manager

Garbage Collection
CollectorCountTime
PS_Scavenge 22,673 702,544
PS_MarkSweep 143 77,431
MemoryJVM (Heap/Non-Heap)
TypeCommittedUsedMaximum
Heap 9.11 GB 6.23 GB 9.11 GB
Non-Heap 1.73 GB 1.67 GB -1 B
Total 10.8 GB 7.90 GB 9.11 GB


--
Thanks,
Amit


On Mon, Feb 12, 2018 at 9:50 PM, Timo Walther  wrote:

> Hi Amit,
>
> how is the memory consumption when the jobs get stuck? Is the Java GC
> active? Are you using off-heap memory?
>
> Regards,
> Timo
>
> Am 2/12/18 um 10:10 AM schrieb Amit Jain:
>
> Hi,
>>
>> We have created Batch job where we are trying to merge set of S3
>> directories in TextFormat with the old snapshot in Parquet format.
>>
>> We are running 50 such jobs daily and found the progress of few random
>> jobs
>> get stuck in between. We have gone through logs of JobManager, TaskManager
>> and could not get any useful information there.
>>
>> Important operators involved, are read using TextInputFormat, read using
>> HadoopInputFormat, FullOuterJoin, write using our BucketingSink code.
>>
>> Please help resolve this issue.
>>
>> Flink Version 1.3.2 deployed on Yarn Container
>>
>> --
>> Thanks,
>> Amit
>>
>>
>