date:20150122


[ 
https://issues.apache.org/jira/browse/FLINK-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287259#comment-14287259
 ] 

ASF GitHub Bot commented on FLINK-1352:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/328#issuecomment-71002640
  
You are right @hsaputra, because I'm not sure which approach is the best. 
In the corresponding JIRA issue I have tried to give a summary of what I think 
are the pros and cons of indefinitely many registration tries vs. a limited 
number of tries and a constant pause in between tries vs. an increasing pause.

Indefinitely many registration tries:
Pros: If the JobManager becomes available at some point in time, then the 
TaskManager will definitely connect to it
Cons: If the JobManager dies of some reason, then the TaskManager will 
linger around for all eternity or until it is stopped manually

Limited number of tries:
Pros: Will terminate itself after some time
Cons: The time interval might be too short for the JobManager to get started

Constant pause:
Pros: Relatively quick response time
Cons: Causing network traffic until the JobManager has been started

Increasing pause:
Pros: Reduction of network traffic if the JobManager takes a little bit 
longer to start
Cons: Might delay the registration process if one interval was just missed


 Buggy registration from TaskManager to JobManager
 -

 Key: FLINK-1352
 URL: https://issues.apache.org/jira/browse/FLINK-1352
 Project: Flink
  Issue Type: Bug
  Components: JobManager, TaskManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Till Rohrmann
 Fix For: 0.9


 The JobManager's InstanceManager may refuse the registration attempt from a 
 TaskManager, because it has this taskmanager already connected, or,in the 
 future, because the TaskManager has been blacklisted as unreliable.
 Unpon refused registration, the instance ID is null, to signal that refused 
 registration. TaskManager reacts incorrectly to such methods, assuming 
 successful registration
 Possible solution: JobManager sends back a dedicated RegistrationRefused 
 message, if the instance manager returns null as the registration result. If 
 the TastManager receives that before being registered, it knows that the 
 registration response was lost (which should not happen on TCP and it would 
 indicate a corrupt connection)
 Followup question: Does it make sense to have the TaskManager trying 
 indefinitely to connect to the JobManager. With increasing interval (from 
 seconds to minutes)?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: Rename coGroupDataSet.scala to CoGroupDataSet....

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/324#issuecomment-71103567
  
You have my +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: Rename coGroupDataSet.scala to CoGroupDataSet....

Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/324#issuecomment-71119613
  
Cool, thanks @StephanEwen, if no one beats me merging I will do this EOD 
today


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-629) Add support for null values to the java api

2015-01-22 Thread mustafa elbehery (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288646#comment-14288646
 ] 

mustafa elbehery commented on FLINK-629:


Its working now, Thanks a lot robert for your support :) :)

 Add support for null values to the java api
 ---

 Key: FLINK-629
 URL: https://issues.apache.org/jira/browse/FLINK-629
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Reporter: Stephan Ewen
Assignee: Gyula Fora
Priority: Critical
  Labels: github-import
 Fix For: pre-apache

 Attachments: Selection_006.png, SimpleTweetInputFormat.java, 
 Tweet.java, model.tar.gz


 Currently, many runtime operations fail when encountering a null value. Tuple 
 serialization should allow null fields.
 I suggest to add a method to the tuples called `getFieldNotNull()` which 
 throws a meaningful exception when the accessed field is null. That way, we 
 simplify the logic of operators that should not dead with null fields, like 
 key grouping or aggregations.
 Even though SQL allows grouping and aggregating of null values, I suggest to 
 exclude this from the java api, because the SQL semantics of aggregating null 
 fields are messy.
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/629
 Created by: [StephanEwen|https://github.com/StephanEwen]
 Labels: enhancement, java api, 
 Milestone: Release 0.5.1
 Created at: Wed Mar 26 00:27:49 CET 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1352] [runtime] Fix buggy registration ...

Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/328#issuecomment-71148024
  
Thanks for the explation @tillrohrmann 

+1 for exponential backoff approach. We can have max retries and max delay 
for each try as configurable configuration properties.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-1436) Command-line interface verbose option (-v)


 [ 
https://issues.apache.org/jira/browse/FLINK-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-1436:

Labels: starter usability  (was: usability)

 Command-line interface verbose option (-v)
 --

 Key: FLINK-1436
 URL: https://issues.apache.org/jira/browse/FLINK-1436
 Project: Flink
  Issue Type: Improvement
  Components: Start-Stop Scripts
Reporter: Max Michels
Priority: Trivial
  Labels: starter, usability

 Let me run just a basic Flink job and add the verbose flag. It's a general 
 option, so let me add it as a first parameter:
  ./flink -v run ../examples/flink-java-examples-0.8.0-WordCount.jar 
  hdfs:///input hdfs:///output9
 Invalid action!
 ./flink ACTION [GENERAL_OPTIONS] [ARGUMENTS]
   general options:
  -h,--help  Show the help for the CLI Frontend.
  -v,--verbose   Print more detailed error messages.
 Action run compiles and runs a program.
   Syntax: run [OPTIONS] jar-file arguments
   run action arguments:
  -c,--class classname   Class with the program entry point 
 (main
   method or getPlan() method. Only 
 needed
   if the JAR file does not specify the 
 class
   in its manifest.
  -m,--jobmanager host:port  Address of the JobManager (master) to
   which to connect. Use this flag to 
 connect
   to a different JobManager than the one
   specified in the configuration.
  -p,--parallelism parallelism   The parallelism with which to run the
   program. Optional flag to override the
   default value specified in the
   configuration.
 Action info displays information about a program.
   info action arguments:
  -c,--class classname   Class with the program entry point 
 (main
   method or getPlan() method. Only 
 needed
   if the JAR file does not specify the 
 class
   in its manifest.
  -e,--executionplan   Show optimized execution plan of the
   program (JSON)
  -m,--jobmanager host:port  Address of the JobManager (master) to
   which to connect. Use this flag to 
 connect
   to a different JobManager than the one
   specified in the configuration.
  -p,--parallelism parallelism   The parallelism with which to run the
   program. Optional flag to override the
   default value specified in the
   configuration.
 Action list lists running and finished programs.
   list action arguments:
  -m,--jobmanager host:port   Address of the JobManager (master) to which
to connect. Use this flag to connect to a
different JobManager than the one specified
in the configuration.
  -r,--running  Show running programs and their JobIDs
  -s,--scheduledShow scheduled prorgrams and their JobIDs
 Action cancel cancels a running program.
   cancel action arguments:
  -i,--jobid jobIDJobID of program to cancel
  -m,--jobmanager host:port   Address of the JobManager (master) to which
to connect. Use this flag to connect to a
different JobManager than the one specified
in the configuration.
 What just happened? This results in a lot of output which is usually 
 generated if you use the --help option on command-line tools. If your 
 terminal window is large enough, then you will see a tiny message:
 Please specify an action. I did specify an action. Strange. If you read the 
 help messages carefully you see, that general options belong to the action.
  ./flink run -v ../examples/flink-java-examples-0.8.0-WordCount.jar 
  hdfs:///input hdfs:///output9
 For the sake of mitigating user frustration, let us also accept -v as the 
 first argument. It may seem trivial for the day-to-day Flink user but makes a 
 difference for a novice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1352) Buggy registration from TaskManager to JobManager

[
https://issues.apache.org/jira/browse/FLINK-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288569#comment-14288569
]

ASF GitHub Bot commented on FLINK-1352:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/328#issuecomment-71133987

I am not sure that the infinite number of tries is actually bad. This sort
of depends on the situation, I guess:
- On YARN, it may make sense, because the node will then go back into the
pool of available resource
- On standalone, it will anyways be there for Flink, so the TaskManager
might as well keep trying to offer itself for work. Think of a network
partitioning event - after the partitions re-joined, the cluster should work as
a whole again.

How about the following: We have a config parameter how long nodes should
attempt to register. YARN could set a timeout (say 2-5 minutes), while by
default, the timeout is infinite.

Concerning the attempt pause: Having attempts with exponential backoff (and
a cap) is the common thing (and I think it was the default before). Start with
a 50ms pause and double it each attempt and cap it at 1 or 2 minutes or so. If
you miss early attempts, the pause will not be long. If you missed an all
attempts within the first second, you are guaranteed to not wait more than
twice as long as you already waited anyways.

For the sake of transparency and making sure that the states are actually
in sync: How about we have three response messages for the registration attempt:
1. Refused (for whatever reason, the message should have a string that
the TM can log)
2. Accepted (with the assigned ID)
3. Already registered (with the assigned ID) - The current logic handles
this correctly as well, but this will allow us to log better at the TaskManager
and debug problems there much better. Since this is a mechanism which may have
weird cornercase behavior, it would be good to know as much about what was
happening as possible.

Buggy registration from TaskManager to JobManager
-

Key: FLINK-1352
URL: https://issues.apache.org/jira/browse/FLINK-1352
Project: Flink
Issue Type: Bug
Components: JobManager, TaskManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Till Rohrmann
Fix For: 0.9

The JobManager's InstanceManager may refuse the registration attempt from a
TaskManager, because it has this taskmanager already connected, or,in the
future, because the TaskManager has been blacklisted as unreliable.
Unpon refused registration, the instance ID is null, to signal that refused
registration. TaskManager reacts incorrectly to such methods, assuming
successful registration
Possible solution: JobManager sends back a dedicated RegistrationRefused
message, if the instance manager returns null as the registration result. If
the TastManager receives that before being registered, it knows that the
registration response was lost (which should not happen on TCP and it would
indicate a corrupt connection)
Followup question: Does it make sense to have the TaskManager trying
indefinitely to connect to the JobManager. With increasing interval (from
seconds to minutes)?

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1436) Command-line interface verbose option (-v)


[ 
https://issues.apache.org/jira/browse/FLINK-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288533#comment-14288533
 ] 

Stephan Ewen commented on FLINK-1436:
-

+1 for all three suggestions from robert

 Command-line interface verbose option (-v)
 --

 Key: FLINK-1436
 URL: https://issues.apache.org/jira/browse/FLINK-1436
 Project: Flink
  Issue Type: Improvement
  Components: Start-Stop Scripts
Reporter: Max Michels
Priority: Trivial
  Labels: usability

 Let me run just a basic Flink job and add the verbose flag. It's a general 
 option, so let me add it as a first parameter:
  ./flink -v run ../examples/flink-java-examples-0.8.0-WordCount.jar 
  hdfs:///input hdfs:///output9
 Invalid action!
 ./flink ACTION [GENERAL_OPTIONS] [ARGUMENTS]
   general options:
  -h,--help  Show the help for the CLI Frontend.
  -v,--verbose   Print more detailed error messages.
 Action run compiles and runs a program.
   Syntax: run [OPTIONS] jar-file arguments
   run action arguments:
  -c,--class classname   Class with the program entry point 
 (main
   method or getPlan() method. Only 
 needed
   if the JAR file does not specify the 
 class
   in its manifest.
  -m,--jobmanager host:port  Address of the JobManager (master) to
   which to connect. Use this flag to 
 connect
   to a different JobManager than the one
   specified in the configuration.
  -p,--parallelism parallelism   The parallelism with which to run the
   program. Optional flag to override the
   default value specified in the
   configuration.
 Action info displays information about a program.
   info action arguments:
  -c,--class classname   Class with the program entry point 
 (main
   method or getPlan() method. Only 
 needed
   if the JAR file does not specify the 
 class
   in its manifest.
  -e,--executionplan   Show optimized execution plan of the
   program (JSON)
  -m,--jobmanager host:port  Address of the JobManager (master) to
   which to connect. Use this flag to 
 connect
   to a different JobManager than the one
   specified in the configuration.
  -p,--parallelism parallelism   The parallelism with which to run the
   program. Optional flag to override the
   default value specified in the
   configuration.
 Action list lists running and finished programs.
   list action arguments:
  -m,--jobmanager host:port   Address of the JobManager (master) to which
to connect. Use this flag to connect to a
different JobManager than the one specified
in the configuration.
  -r,--running  Show running programs and their JobIDs
  -s,--scheduledShow scheduled prorgrams and their JobIDs
 Action cancel cancels a running program.
   cancel action arguments:
  -i,--jobid jobIDJobID of program to cancel
  -m,--jobmanager host:port   Address of the JobManager (master) to which
to connect. Use this flag to connect to a
different JobManager than the one specified
in the configuration.
 What just happened? This results in a lot of output which is usually 
 generated if you use the --help option on command-line tools. If your 
 terminal window is large enough, then you will see a tiny message:
 Please specify an action. I did specify an action. Strange. If you read the 
 help messages carefully you see, that general options belong to the action.
  ./flink run -v ../examples/flink-java-examples-0.8.0-WordCount.jar 
  hdfs:///input hdfs:///output9
 For the sake of mitigating user frustration, let us also accept -v as the 
 first argument. It may seem trivial for the day-to-day Flink user but makes a 
 difference for a novice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1424) bin/flink run does not recognize -c parameter anymore


[ 
https://issues.apache.org/jira/browse/FLINK-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288543#comment-14288543
 ] 

Stephan Ewen commented on FLINK-1424:
-

[~aalexandrov] was also suggesting to use a different CLI library in 
[FLINK-1347] Maybe that could fix this as well.

 bin/flink run does not recognize -c parameter anymore
 -

 Key: FLINK-1424
 URL: https://issues.apache.org/jira/browse/FLINK-1424
 Project: Flink
  Issue Type: Bug
  Components: TaskManager
Affects Versions: master
Reporter: Carsten Brandt

 bin/flink binary does not recognize `-c` parameter anymore which specifies 
 the class to run:
 {noformat}
 $ ./flink run /path/to/target/impro3-ws14-flink-1.0-SNAPSHOT.jar -c 
 de.tu_berlin.impro3.flink.etl.FollowerGraphGenerator /tmp/flink/testgraph.txt 
 1
 usage: emma-experiments-impro3-ss14-flink
[-?]
 emma-experiments-impro3-ss14-flink: error: unrecognized arguments: '-c'
 {noformat}
 before this command worked fine and executed the job.
 I tracked it down to the following commit using `git bisect`:
 {noformat}
 93eadca782ee8c77f89609f6d924d73021dcdda9 is the first bad commit
 commit 93eadca782ee8c77f89609f6d924d73021dcdda9
 Author: Alexander Alexandrov alexander.s.alexand...@gmail.com
 Date:   Wed Dec 24 13:49:56 2014 +0200
 [FLINK-1027] [cli] Added support for '--' and '-' prefixed tokens in CLI 
 program arguments.
 
 This closes #278
 :04 04 a1358e6f7fe308b4d51a47069f190a29f87fdeda 
 d6f11bbc9444227d5c6297ec908e44b9644289a9 Mflink-clients
 {noformat}
 https://github.com/apache/flink/commit/93eadca782ee8c77f89609f6d924d73021dcdda9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: Rename coGroupDataSet.scala to CoGroupDataSet....

2015-01-22 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/324


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: Rename coGroupDataSet.scala to CoGroupDataSet....

Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/324#issuecomment-71134892
  
As per recommendation from @StephanEwen, will not merge this to 0.8 until 
we need to cherry-pick fixes related to these files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: Rename coGroupDataSet.scala to CoGroupDataSet....

Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/324#issuecomment-71079720
  
Can I get +1 for this one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1433) Add HADOOP_CLASSPATH to start scripts


[ 
https://issues.apache.org/jira/browse/FLINK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287776#comment-14287776
 ] 

Stephan Ewen commented on FLINK-1433:
-

+1 that seems very important

 Add HADOOP_CLASSPATH to start scripts
 -

 Key: FLINK-1433
 URL: https://issues.apache.org/jira/browse/FLINK-1433
 Project: Flink
  Issue Type: Improvement
Reporter: Robert Metzger

 With the Hadoop file system wrapper, its important to have access to the 
 hadoop filesystem classes.
 The HADOOP_CLASSPATH seems to be a standard environment variable used by 
 Hadoop for such libraries.
 Deployments like Google Compute Cloud set this variable containing the 
 Google Cloud Storage Hadoop Wrapper. So if users want to use the Cloud 
 Storage in an non-yarn environment, we need to address this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1437) Bug in PojoSerializer's copy() method


[ 
https://issues.apache.org/jira/browse/FLINK-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287801#comment-14287801
 ] 

Stephan Ewen commented on FLINK-1437:
-

Good catch. The POJO serializer can probably handle this, without need to 
update all individual serializers?

 Bug in PojoSerializer's copy() method
 -

 Key: FLINK-1437
 URL: https://issues.apache.org/jira/browse/FLINK-1437
 Project: Flink
  Issue Type: Bug
  Components: Java API
Reporter: Timo Walther
Assignee: Timo Walther

 The PojoSerializer's {{copy()}} method does not work properly with {{null}} 
 values. An exception could look like:
 {code}
 Caused by: java.io.IOException: Thread 'SortMerger spilling thread' 
 terminated due to an exception: null
   at 
 org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:792)
 Caused by: java.io.EOFException
   at 
 org.apache.flink.runtime.io.disk.RandomAccessInputView.nextSegment(RandomAccessInputView.java:83)
   at 
 org.apache.flink.runtime.memorymanager.AbstractPagedInputView.advance(AbstractPagedInputView.java:159)
   at 
 org.apache.flink.runtime.memorymanager.AbstractPagedInputView.readByte(AbstractPagedInputView.java:270)
   at 
 org.apache.flink.runtime.memorymanager.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:277)
   at org.apache.flink.types.StringValue.copyString(StringValue.java:839)
   at 
 org.apache.flink.api.common.typeutils.base.StringSerializer.copy(StringSerializer.java:83)
   at 
 org.apache.flink.api.java.typeutils.runtime.PojoSerializer.copy(PojoSerializer.java:261)
   at 
 org.apache.flink.runtime.operators.sort.NormalizedKeySorter.writeToOutput(NormalizedKeySorter.java:449)
   at 
 org.apache.flink.runtime.operators.sort.UnilateralSortMerger$SpillingThread.go(UnilateralSortMerger.java:1303)
   at 
 org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:788)
 {code}
 I'm working on a fix for that...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1353) Execution always uses DefaultAkkaAskTimeout, rather than the configured value


[ 
https://issues.apache.org/jira/browse/FLINK-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288912#comment-14288912
 ] 

ASF GitHub Bot commented on FLINK-1353:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/325#issuecomment-71157412
  
Looks good to merge


 Execution always uses DefaultAkkaAskTimeout, rather than the configured value
 -

 Key: FLINK-1353
 URL: https://issues.apache.org/jira/browse/FLINK-1353
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Till Rohrmann
 Fix For: 0.9






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1353] Fixes the Execution to use the co...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/325#issuecomment-71157412
  
Looks good to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/326#issuecomment-71157010
  
I took the pull request and filtered the branch to move the files into the 
`flink-addons/flink-gelly` directory  (that worked better for me than the 
subtree merge). Also, I prefixed all commit messages with the jira issue and 
component. You can find it here, it perfectly preserves all commit history.

https://github.com/StephanEwen/incubator-flink/commits/gelly

Let me know if I should merge it like that.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1392) Serializing Protobuf - issue 1


[ 
https://issues.apache.org/jira/browse/FLINK-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288929#comment-14288929
 ] 

ASF GitHub Bot commented on FLINK-1392:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/322#issuecomment-71158115
  
Per the discussion on the dev mailing list 
(http://mail-archives.apache.org/mod_mbox/flink-dev/201501.mbox/browser), 
should this be registered by the code that analyzes the generic type and sets 
up the serializer for the encountered (nested) types? [FLINK-1417] 


 Serializing Protobuf - issue 1
 --

 Key: FLINK-1392
 URL: https://issues.apache.org/jira/browse/FLINK-1392
 Project: Flink
  Issue Type: Bug
Reporter: Felix Neutatz
Assignee: Robert Metzger
Priority: Minor

 Hi, I started to experiment with Parquet using Protobuf.
 When I use the standard Protobuf class: 
 com.twitter.data.proto.tutorial.AddressBookProtos
 The code which I run, can be found here: 
 [https://github.com/FelixNeutatz/incubator-flink/blob/ParquetAtFlink/flink-addons/flink-hadoop-compatibility/src/main/java/org/apache/flink/hadoopcompatibility/mapreduce/example/ParquetProtobufOutput.java]
 I get the following exception:
 {code:xml}
 Exception in thread main java.lang.Exception: Deserializing the 
 InputFormat (org.apache.flink.api.java.io.CollectionInputFormat) failed: 
 Could not read the user code wrapper: Error while deserializing element from 
 collection
   at 
 org.apache.flink.runtime.jobgraph.InputFormatVertex.initializeOnMaster(InputFormatVertex.java:60)
   at 
 org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$5.apply(JobManager.scala:179)
   at 
 org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$5.apply(JobManager.scala:172)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
   at 
 org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:172)
   at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
   at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
   at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
   at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:34)
   at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:27)
   at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
   at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:27)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at 
 org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:52)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
   at akka.dispatch.Mailbox.run(Mailbox.scala:221)
   at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: 
 org.apache.flink.runtime.operators.util.CorruptConfigurationException: Could 
 not read the user code wrapper: Error while deserializing element from 
 collection
   at 
 org.apache.flink.runtime.operators.util.TaskConfig.getStubWrapper(TaskConfig.java:285)
   at 
 org.apache.flink.runtime.jobgraph.InputFormatVertex.initializeOnMaster(InputFormatVertex.java:57)
   ... 25 more
 Caused by: java.io.IOException: Error while deserializing element from 
 collection
   at 
 org.apache.flink.api.java.io.CollectionInputFormat.readObject(CollectionInputFormat.java:108)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
   at

[jira] [Commented] (FLINK-1391) Kryo fails to properly serialize avro collection types


[ 
https://issues.apache.org/jira/browse/FLINK-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288922#comment-14288922
 ] 

ASF GitHub Bot commented on FLINK-1391:
---

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/323#discussion_r23435527
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/KryoSerializer.java
 ---
@@ -237,6 +244,25 @@ private void checkKryoInitialized() {
// Throwable and all subclasses should be serialized 
via java serialization
kryo.addDefaultSerializer(Throwable.class, new 
JavaSerializer());
 
+   // If the type we have to serialize as a GenricType is 
implementing SpecificRecordBase,
+   // we have to register the avro serializer
+   // This rule only applies if users explicitly use the 
GenericTypeInformation for the avro types
+   // usually, we are able to handle Avro POJOs with the 
POJO serializer.
+   if(SpecificRecordBase.class.isAssignableFrom(type)) {
+   ClassTagSpecificRecordBase tag = 
scala.reflect.ClassTag$.MODULE$.apply(type);
+   this.kryo.register(type, 
com.twitter.chill.avro.AvroSerializer.SpecificRecordSerializer(tag));
+
+   }
+   // Avro POJOs contain java.util.List which have 
GenericData.Array as their runtime type
+   // because Kryo is not able to serialize them properly, 
we use this serializer for them
+   this.kryo.register(GenericData.Array.class, new 
SpecificInstanceCollectionSerializer(ArrayList.class));
--- End diff --

Should we make this registration conditional so that it only happens when 
we have encountered an Avro type?


 Kryo fails to properly serialize avro collection types
 --

 Key: FLINK-1391
 URL: https://issues.apache.org/jira/browse/FLINK-1391
 Project: Flink
  Issue Type: Improvement
Affects Versions: 0.8, 0.9
Reporter: Robert Metzger
Assignee: Robert Metzger

 Before FLINK-610, Avro was the default generic serializer.
 Now, special types coming from Avro are handled by Kryo .. which seems to 
 cause errors like:
 {code}
 Exception in thread main 
 org.apache.flink.runtime.client.JobExecutionException: 
 java.lang.NullPointerException
   at org.apache.avro.generic.GenericData$Array.add(GenericData.java:200)
   at 
 com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:116)
   at 
 com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:22)
   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
   at 
 org.apache.flink.api.java.typeutils.runtime.KryoSerializer.deserialize(KryoSerializer.java:143)
   at 
 org.apache.flink.api.java.typeutils.runtime.KryoSerializer.deserialize(KryoSerializer.java:148)
   at 
 org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:244)
   at 
 org.apache.flink.runtime.plugable.DeserializationDelegate.read(DeserializationDelegate.java:56)
   at 
 org.apache.flink.runtime.io.network.serialization.AdaptiveSpanningRecordDeserializer.getNextRecord(AdaptiveSpanningRecordDeserializer.java:71)
   at 
 org.apache.flink.runtime.io.network.channels.InputChannel.readRecord(InputChannel.java:189)
   at 
 org.apache.flink.runtime.io.network.gates.InputGate.readRecord(InputGate.java:176)
   at 
 org.apache.flink.runtime.io.network.api.MutableRecordReader.next(MutableRecordReader.java:51)
   at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:53)
   at 
 org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:170)
   at 
 org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:257)
   at java.lang.Thread.run(Thread.java:744)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1391] Add support for using Avro-POJOs ...

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/323#discussion_r23435527
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/KryoSerializer.java
 ---
@@ -237,6 +244,25 @@ private void checkKryoInitialized() {
// Throwable and all subclasses should be serialized 
via java serialization
kryo.addDefaultSerializer(Throwable.class, new 
JavaSerializer());
 
+   // If the type we have to serialize as a GenricType is 
implementing SpecificRecordBase,
+   // we have to register the avro serializer
+   // This rule only applies if users explicitly use the 
GenericTypeInformation for the avro types
+   // usually, we are able to handle Avro POJOs with the 
POJO serializer.
+   if(SpecificRecordBase.class.isAssignableFrom(type)) {
+   ClassTagSpecificRecordBase tag = 
scala.reflect.ClassTag$.MODULE$.apply(type);
+   this.kryo.register(type, 
com.twitter.chill.avro.AvroSerializer.SpecificRecordSerializer(tag));
+
+   }
+   // Avro POJOs contain java.util.List which have 
GenericData.Array as their runtime type
+   // because Kryo is not able to serialize them properly, 
we use this serializer for them
+   this.kryo.register(GenericData.Array.class, new 
SpecificInstanceCollectionSerializer(ArrayList.class));
--- End diff --

Should we make this registration conditional so that it only happens when 
we have encountered an Avro type?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1147) TypeInference on POJOs


[ 
https://issues.apache.org/jira/browse/FLINK-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288937#comment-14288937
 ] 

ASF GitHub Bot commented on FLINK-1147:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/315#issuecomment-71158420
  
+1 to merge from me as well. Go ahead, Timo...


 TypeInference on POJOs
 --

 Key: FLINK-1147
 URL: https://issues.apache.org/jira/browse/FLINK-1147
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Affects Versions: 0.7.0-incubating
Reporter: Stephan Ewen
Assignee: Timo Walther

 On Tuples, we currently use type inference that figures out the types of 
 output type variables relative to the input type variable.
 We need a similar functionality for POJOs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1147][Java API] TypeInference on POJOs

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/315#issuecomment-71158420
  
+1 to merge from me as well. Go ahead, Timo...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1165) No createCollectionsEnvironment in Java API