[GitHub] flink issue #4525: [FLINK-7423] Always reuse an instance to get elements fro...

2017-08-21 Thread XuPingyong
Github user XuPingyong commented on the issue:

https://github.com/apache/flink/pull/4525
  
Thanks @StephanEwen ,  so I think `null` values can be returned from 
`InputFormat#nextRecord` anytime, even not end. 

Can `null` values be passed to `InputFormat#nextRecord`,  or checkNull(if 
`null`, give a new object) before passed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4525: [FLINK-7423] Always reuse an instance to get elements fro...

2017-08-18 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/4525
  
I think `null` values are not really permitted at the moment, but I think 
the InputFormats do not explicitly forbid them. That's why the logic is not 
"run until returns null", but "run until reachedEnd()".

The change here implements a mixed contract like "run until reachedEnd() or 
returns null", which is more complicated to document, enforce, test, and 
understand for users.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4525: [FLINK-7423] Always reuse an instance to get elements fro...

2017-08-18 Thread greghogan
Github user greghogan commented on the issue:

https://github.com/apache/flink/pull/4525
  
@StephanEwen why are `null` values permitted if not in the contract of the 
input formats?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4525: [FLINK-7423] Always reuse an instance to get elements fro...

2017-08-18 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/4525
  
The logic is currently not correct with the contract of the input formats. 
A return value of null is not an "end of split" indicator.

Also, the description mentions that this adds a test, which I cannot find 
in the diff...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4525: [FLINK-7423] Always reuse an instance to get elements fro...

2017-08-17 Thread XuPingyong
Github user XuPingyong commented on the issue:

https://github.com/apache/flink/pull/4525
  
Thanks @greghogan , `MergeIterator` is really a special case that it would 
be better to pass the returned object to  `MutableObjectIterator#next` again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4525: [FLINK-7423] Always reuse an instance to get elements fro...

2017-08-13 Thread greghogan
Github user greghogan commented on the issue:

https://github.com/apache/flink/pull/4525
  
`MutableObjectIterator#next` allows/requires a `null` result: "@return The 
object or null if the iterator is exhausted.". The documentation 
could be improved, e.g. in `MergeIterator` the returned object is not 
necessarily the immediately passed argument.

`DataSourceTask` looks to be the only `InputFormat` consumer re-passing a 
reuse object.

It seems the reason to allow returning `null` is handling, for example, a 
bad record at the end of a file, such that `reachedEnd` would have returned 
`false` but there is no record to return.

Object reuse is always optional so is not an issue for immutable types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4525: [FLINK-7423] Always reuse an instance to get elements fro...

2017-08-12 Thread XuPingyong
Github user XuPingyong commented on the issue:

https://github.com/apache/flink/pull/4525
  
@greghogan, if the object passed to nextRecord may be reused internally 
by the InputFormat, do the similar cases need to  be re-considered?

In `DataSourceTask.java`:
  
  OT reuse = serializer.createInstance();

// as long as there is data to read
while (!this.taskCanceled && !format.reachedEnd()) {
OT returned;
if ((returned = format.nextRecord(reuse)) != null) {
output.collect(returned);
}
}

And in many batch drivers:
  
  final MutableObjectIterator in = taskContext.getInput(0);
  T value = serializer.createInstance();
  while (running && (value = in.next(value)) != null) {
  ...
  } 


In my opinion:
 1.  `Null` records are meaningless, but `null` is meaningful for input 
or format which means the end. If a user only call `InputFormat#nextRecord` 
without `InputFormat#reachedEnd`, only `null` can be returned. 
 2.  The returned object of `InputFormat#nextRecord` should not need to 
be considered that it may be passed again. If a immutable object is returned, 
an exception will be thrown  when it is reused again in 
`InputFormat#nextRecord`.

@greghogan, could you please offer some cases that the object passed to 
nextRecord can be reused internally by the InputFormat?  Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4525: [FLINK-7423] Always reuse an instance to get elements fro...

2017-08-12 Thread greghogan
Github user greghogan commented on the issue:

https://github.com/apache/flink/pull/4525
  
As I understand Flink does not allow `null` records since what does it mean 
to partition or window a null record? `null` fields are sometimes allowed but 
`null` records are meaningless.

The caller of `InputFormat#nextRecord` cannot reuse an object which has 
already been passed to `nextRecord` as a reusable object as this may result in 
an object being "used" twice. That's why object reuse always replaces the 
passed object with the returned object, which may be the same but could be 
different.

@kl0u could you offer some insight into FLINK-4075 adding the null check 
with break to `InputFormatSourceFunction#run`? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4525: [FLINK-7423] Always reuse an instance to get elements fro...

2017-08-12 Thread XuPingyong
Github user XuPingyong commented on the issue:

https://github.com/apache/flink/pull/4525
  
Thanks @greghogan. 

  I think  the InputFormat can return anything, may be null or other 
immutable objects that cannot be passed to nextRecord.
 And I am also not clear why the object passed to nextRecord may be 
reused internally by the InputFormat. The param of nextRecord named reuse means 
it may be re-passed to this call again?

What do you think?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #4525: [FLINK-7423] Always reuse an instance to get elements fro...

2017-08-11 Thread greghogan
Github user greghogan commented on the issue:

https://github.com/apache/flink/pull/4525
  
Hi @XuPingyong thanks for submitting this PR. I'm not clear on why the 
`null` check was added in FLINK-4075 and `ContinuousFileProcessingTest` is not 
running locally for me. When calling `InputFormat#nextRecord` we can only reuse 
the returned object. The object passed to `nextRecord` may be reused internally 
by the `InputFormat` so we cannot simply pass in the same object. It would be 
better to move `OUT nextElement = serializer.createInstance();` into the outer 
loop.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---