[jira] [Comment Edited] (FLINK-20043) Add flink-sql-connector-kinesis package
[ https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230725#comment-17230725 ] Alexander Alexandrov edited comment on FLINK-20043 at 11/12/20, 4:14 PM: - I can verify that the following dependencies are required and not including them will result in compile-time errors: * [jackson.*|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/AWSUtil.java#L46-L50] * [guava|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java#L41-L44] I am struggling to identify a code path that would require {{joda-time}}, though. I cannot even figure out which {{pom.xml}} declares the dependency on {{joda-time}}. was (Author: aalexandrov): I can verify that the following dependencies are required and not including them will result in compile-time errors: * [jackson.*|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/AWSUtil.java#L46-L50]. * [guava|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java#L41-L44] I am struggling to identify a code path that would require {{joda-time}}, though. I cannot even figure out which {{pom.xml}} declares the dependency on {{joda-time}}. > Add flink-sql-connector-kinesis package > --- > > Key: FLINK-20043 > URL: https://issues.apache.org/jira/browse/FLINK-20043 > Project: Flink > Issue Type: Improvement >Reporter: Alexander Alexandrov >Assignee: Alexander Alexandrov >Priority: Major > Labels: pull-request-available > Attachments: third-party-report.html > > > This is a follow up item after the recent addition of [Kinesis SQL source and > sink in PR #13770|https://github.com/apache/flink/pull/13770]. > Create a package that bundles a fat connector jar that can be used by SQL > clients. See FLINK-11026 and the related PRs for a discussion how to do that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-20043) Add flink-sql-connector-kinesis package
[ https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230725#comment-17230725 ] Alexander Alexandrov edited comment on FLINK-20043 at 11/12/20, 4:12 PM: - I can verify that the following dependencies are required and not including them will result in compile-time errors: * [jackson.*|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/AWSUtil.java#L46-L50]. * [guava|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java#L41-L44] I am struggling to identify a code path that would require {{joda-time}}, though. I cannot even figure out which {{pom.xml}} declares the dependency on {{joda-time}}. was (Author: aalexandrov): I can verify that [Jackson is required at runtime|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/AWSUtil.java#L46-L50|https://github.com/apache/flink/blob/bbcd0c791371c2c6b3e477a83adfbd78dbee2602/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/AWSUtil.java#L46-L50] and we will hit errors if we don't include it. I am struggling to identify a code path that would require {{joda-time}}, though. I cannot even figure out which {{pom.xml}} declares the dependency on {{joda-time}}. > Add flink-sql-connector-kinesis package > --- > > Key: FLINK-20043 > URL: https://issues.apache.org/jira/browse/FLINK-20043 > Project: Flink > Issue Type: Improvement >Reporter: Alexander Alexandrov >Assignee: Alexander Alexandrov >Priority: Major > Labels: pull-request-available > Attachments: third-party-report.html > > > This is a follow up item after the recent addition of [Kinesis SQL source and > sink in PR #13770|https://github.com/apache/flink/pull/13770]. > Create a package that bundles a fat connector jar that can be used by SQL > clients. See FLINK-11026 and the related PRs for a discussion how to do that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20043) Add flink-sql-connector-kinesis package
[ https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230725#comment-17230725 ] Alexander Alexandrov commented on FLINK-20043: -- I can verify that [Jackson is required at runtime|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/AWSUtil.java#L46-L50|https://github.com/apache/flink/blob/bbcd0c791371c2c6b3e477a83adfbd78dbee2602/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/AWSUtil.java#L46-L50] and we will hit errors if we don't include it. I am struggling to identify a code path that would require {{joda-time}}, though. I cannot even figure out which {{pom.xml}} declares the dependency on {{joda-time}}. > Add flink-sql-connector-kinesis package > --- > > Key: FLINK-20043 > URL: https://issues.apache.org/jira/browse/FLINK-20043 > Project: Flink > Issue Type: Improvement >Reporter: Alexander Alexandrov >Assignee: Alexander Alexandrov >Priority: Major > Labels: pull-request-available > Attachments: third-party-report.html > > > This is a follow up item after the recent addition of [Kinesis SQL source and > sink in PR #13770|https://github.com/apache/flink/pull/13770]. > Create a package that bundles a fat connector jar that can be used by SQL > clients. See FLINK-11026 and the related PRs for a discussion how to do that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20043) Add flink-sql-connector-kinesis package
[ https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228568#comment-17228568 ] Alexander Alexandrov commented on FLINK-20043: -- Not really, even for a simple Kinesis source → Print sink dataflow such as {code:sql} CREATE TABLE `mm-10139-source` ( `event_time` TIMESTAMP(3) NOT NULL, `name` VARCHAR(32) NOT NULL, `age` BIGINT NOT NULL, `office` VARCHAR(255) NOT NULL, `role` VARCHAR(4) NOT NULL, `arrival_time` TIMESTAMP(3) METADATA FROM 'timestamp' VIRTUAL, `shard_id` VARCHAR(128) NOT NULL METADATA FROM 'shard-id' VIRTUAL, WATERMARK FOR `event_time` AS `event_time` - INTERVAL '5' second ) PARTITIONED BY (office, role) WITH ( 'connector' = 'kinesis', 'stream' = 'mm-10139-source', 'aws.region' = 'us-east-2', 'scan.stream.initpos' = 'LATEST', 'sink.partitioner-field-delimiter' = ';', 'sink.producer.collection-max-count' = '100', 'format' = 'json', 'json.timestamp-format.standard' = 'ISO-8601' ); CREATE TABLE `print-source` ( `event_time` TIMESTAMP(3), `name` VARCHAR(32), `age` BIGINT, `office` VARCHAR(255), `role` VARCHAR(4) ) WITH ( 'connector' = 'print', 'print-identifier' = 'source' ); INSERT INTO `print-source` SELECT `event_time`, `name`, `age`, `office`, `role` FROM `mm-10139-source`;{code} I get the following error {code} org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116) at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78) at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:224) at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:217) at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:208) at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:534) at org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:89) at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:419) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:286) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:201) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at akka.actor.Actor$class.aroundReceive(Actor.scala:517) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory at org.apache.flink.kinesis.shaded.com.amazonaws.ClientConfiguration.(ClientConfiguration.java:47) at org.apache.flink.kinesis.shaded.com.amazonaws.ClientConfigurationFactory.getDefaultConfig(ClientConfigurationFactory.java:46) at org.apache.flink.kinesis.shaded.com.amazonaws.ClientConfigurationFactory.getConfig(ClientConfigurationFactory.java:36)
[jira] [Updated] (FLINK-20043) Add flink-sql-connector-kinesis package
[ https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Alexandrov updated FLINK-20043: - Description: This is a follow up item after the recent addition of [Kinesis SQL source and sink in PR #13770|https://github.com/apache/flink/pull/13770]. Create a package that bundles a fat connector jar that can be used by SQL clients. See FLINK-11026 and the related PRs for a discussion how to do that. was: This is a follow up item after the recent addition of [Kinesis SQL source and sink in PR #13770 |https://github.com/apache/flink/pull/13770]. Create a package that bundles a fat connector jar that can be used by SQL clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] and the related PRs for a discussion how to do that. > Add flink-sql-connector-kinesis package > --- > > Key: FLINK-20043 > URL: https://issues.apache.org/jira/browse/FLINK-20043 > Project: Flink > Issue Type: Improvement >Reporter: Alexander Alexandrov >Assignee: Alexander Alexandrov >Priority: Major > Attachments: third-party-report.html > > > This is a follow up item after the recent addition of [Kinesis SQL source and > sink in PR #13770|https://github.com/apache/flink/pull/13770]. > Create a package that bundles a fat connector jar that can be used by SQL > clients. See FLINK-11026 and the related PRs for a discussion how to do that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20043) Add flink-sql-connector-kinesis package
[ https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228528#comment-17228528 ] Alexander Alexandrov commented on FLINK-20043: -- This is from the bundled {{NOTICE}} {quote} This project bundles the following dependencies under the Apache Software License 2.0. (http://www.apache.org/licenses/LICENSE-2.0.txt) - com.amazonaws:amazon-kinesis-client:1.11.2 - com.amazonaws:amazon-kinesis-producer:0.14.0 - com.amazonaws:aws-java-sdk-core:1.11.754 - com.amazonaws:aws-java-sdk-dynamodb:1.11.603 - com.amazonaws:aws-java-sdk-kinesis:1.11.754 - com.amazonaws:aws-java-sdk-kms:1.11.603 - com.amazonaws:aws-java-sdk-s3:1.11.603 - com.amazonaws:aws-java-sdk-sts:1.11.754 - com.amazonaws:dynamodb-streams-kinesis-adapter:1.5.0 - com.amazonaws:jmespath-java:1.11.754 - org.apache.httpcomponents:httpclient:4.5.9 - org.apache.httpcomponents:httpcore:4.4.6 This project bundles the following dependencies under the BSD license. See bundled license files for details. - com.google.protobuf:protobuf-java:2.6.1 {quote} > Add flink-sql-connector-kinesis package > --- > > Key: FLINK-20043 > URL: https://issues.apache.org/jira/browse/FLINK-20043 > Project: Flink > Issue Type: Improvement >Reporter: Alexander Alexandrov >Assignee: Alexander Alexandrov >Priority: Major > Attachments: third-party-report.html > > > This is a follow up item after the recent addition of [Kinesis SQL source and > sink in PR #13770 |https://github.com/apache/flink/pull/13770]. > Create a package that bundles a fat connector jar that can be used by SQL > clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] > and the related PRs for a discussion how to do that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20043) Add flink-sql-connector-kinesis package
[ https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228525#comment-17228525 ] Alexander Alexandrov commented on FLINK-20043: -- {quote}E.g. I read "Do not relocate guava because it is exposed in the Kinesis API", this should not apply to a SQL connector because everything is controlled by us. {quote} Just to clarify, you say that in a {{flink-sql-connector-kinesis}} fat jar we can safely relocate Guava code because the {{FlinkKinesisProducer}} (which is marked as {{@PublicEvolving}} and depends on some Guava classes) is not meant to be used directly by {{flink-sql-connector-kinesis}} clients? {quote}I would make the decision if we need a dedicated {{flink-sql-connector-kinesis}} module up to the content of the current Kinesis jar. Does it include Jackson, Guava, JodaTime? {quote} Upon manual inspection of the contents of [the {{flink-connector-kinesis_2.12-1.11.2.jar}} found in Maven Central|https://search.maven.org/remotecontent?filepath=org/apache/flink/flink-connector-kinesis_2.12/1.11.2/flink-connector-kinesis_2.12-1.11.2.jar] I cannot find traces of {{jackson}}, {{joda}}, or {{guava}} in the package contents, so I guess the answer is no. > Add flink-sql-connector-kinesis package > --- > > Key: FLINK-20043 > URL: https://issues.apache.org/jira/browse/FLINK-20043 > Project: Flink > Issue Type: Improvement >Reporter: Alexander Alexandrov >Assignee: Alexander Alexandrov >Priority: Major > Attachments: third-party-report.html > > > This is a follow up item after the recent addition of [Kinesis SQL source and > sink in PR #13770 |https://github.com/apache/flink/pull/13770]. > Create a package that bundles a fat connector jar that can be used by SQL > clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] > and the related PRs for a discussion how to do that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-20043) Add flink-sql-connector-kinesis package
[ https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228525#comment-17228525 ] Alexander Alexandrov edited comment on FLINK-20043 at 11/9/20, 11:25 AM: - {quote}E.g. I read "Do not relocate guava because it is exposed in the Kinesis API", this should not apply to a SQL connector because everything is controlled by us. {quote} Just to clarify, you say that in a {{flink-sql-connector-kinesis}} fat jar we can safely relocate Guava code because the {{FlinkKinesisProducer}} (which is marked as {{@PublicEvolving}} and depends on some Guava classes) is not meant to be used directly by {{flink-sql-connector-kinesis}} clients? {quote}I would make the decision if we need a dedicated {{flink-sql-connector-kinesis}} module up to the content of the current Kinesis jar. Does it include Jackson, Guava, JodaTime? {quote} Upon manual inspection of the contents of [the {{flink-connector-kinesis_2.12-1.11.2.jar}} found in Maven Central|https://search.maven.org/remotecontent?filepath=org/apache/flink/flink-connector-kinesis_2.12/1.11.2/flink-connector-kinesis_2.12-1.11.2.jar] I cannot find traces of {{jackson}}, {{joda}}, or {{guava}} in the package contents, so I guess the answer is no. was (Author: aalexandrov): {quote}E.g. I read "Do not relocate guava because it is exposed in the Kinesis API", this should not apply to a SQL connector because everything is controlled by us. {quote} Just to clarify, you say that in a {{flink-sql-connector-kinesis}} fat jar we can safely relocate Guava code because the {{FlinkKinesisProducer}} (which is marked as {{@PublicEvolving}} and depends on some Guava classes) is not meant to be used directly by {{flink-sql-connector-kinesis}} clients? {quote}I would make the decision if we need a dedicated {{flink-sql-connector-kinesis}} module up to the content of the current Kinesis jar. Does it include Jackson, Guava, JodaTime? {quote} Upon manual inspection of the contents of [the {{flink-connector-kinesis_2.12-1.11.2.jar}} found in Maven Central|https://search.maven.org/remotecontent?filepath=org/apache/flink/flink-connector-kinesis_2.12/1.11.2/flink-connector-kinesis_2.12-1.11.2.jar] I cannot find traces of {{jackson}}, {{joda}}, or {{guava}} in the package contents, so I guess the answer is no. > Add flink-sql-connector-kinesis package > --- > > Key: FLINK-20043 > URL: https://issues.apache.org/jira/browse/FLINK-20043 > Project: Flink > Issue Type: Improvement >Reporter: Alexander Alexandrov >Assignee: Alexander Alexandrov >Priority: Major > Attachments: third-party-report.html > > > This is a follow up item after the recent addition of [Kinesis SQL source and > sink in PR #13770 |https://github.com/apache/flink/pull/13770]. > Create a package that bundles a fat connector jar that can be used by SQL > clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] > and the related PRs for a discussion how to do that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-20043) Add flink-sql-connector-kinesis package
[ https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227979#comment-17227979 ] Alexander Alexandrov edited comment on FLINK-20043 at 11/8/20, 12:06 PM: - BTW, unlike {{flink-connector-kafka}} and {{flink-connector-elasticsearch[6|7]}} the Maven jars published by the {{flink-connector-kinesis}} package are already fat. In that sense adding a separate {{flink-sql-connector-kinesis}} package just to build a fat jar seems like unnecessary overhead. Should I remove [the {{maven-shade-plugin}} configuration in {{flink-connector-kinesis}} |https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330]as part of this change? This breaks the assumptions made of existing clients (essentially from 1.12.x if I want the fat jar I will have to pull the {{flink-sql-connector-kinesis}} package, even if I am not using the Table API or SQL features). was (Author: aalexandrov): BTW, unlike {{flink-connector-kafka}} and {{flink-connector-elasticsearch[6|7]}} the Maven jars published by the {{flink-connector-kinesis}} package are already fat. In that sense adding a separate {{flink-sql-connector-kinesis}} package just to build a fat jar seems like unnecessary overhead. Should I remove [the {{maven-shade-plugin}} configuration in {{flink-connector-kinesis}} |https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330]as part of this change? This breaks the assumptions made of existing clients (essentially from 1.12.x if I want the fat jar I will have to pull the {{flink-sql-connector-kinesis}} package, even if I am not using the Table API or SQL features. > Add flink-sql-connector-kinesis package > --- > > Key: FLINK-20043 > URL: https://issues.apache.org/jira/browse/FLINK-20043 > Project: Flink > Issue Type: Improvement >Reporter: Alexander Alexandrov >Assignee: Alexander Alexandrov >Priority: Major > Attachments: third-party-report.html > > > This is a follow up item after the recent addition of [Kinesis SQL source and > sink in PR #13770 |https://github.com/apache/flink/pull/13770]. > Create a package that bundles a fat connector jar that can be used by SQL > clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] > and the related PRs for a discussion how to do that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-20043) Add flink-sql-connector-kinesis package
[ https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227979#comment-17227979 ] Alexander Alexandrov edited comment on FLINK-20043 at 11/8/20, 12:06 PM: - BTW, unlike {{flink-connector-kafka}} and {{flink-connector-elasticsearch[6|7]}} the Maven jars published by the {{flink-connector-kinesis}} package are already fat. In that sense adding a separate {{flink-sql-connector-kinesis}} package just to build a fat jar seems like unnecessary overhead. Should I remove [the {{maven-shade-plugin}} configuration in {{flink-connector-kinesis}} |https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330]as part of this change? This breaks the assumptions made of existing clients (essentially from 1.12.x if I want the fat jar I will have to pull the {{flink-sql-connector-kinesis}} package, even if I am not using the Table API or SQL features. was (Author: aalexandrov): BTW, unlike {{flink-connector-kafka}} and {{flink-connector-elasticsearch[6|7]}} the Maven jars published by the {{flink-connector-kinesis}} package are already fat. In that sense adding a separate {{flink-sql-connector-kinesis}} package doing the same thing seems like unnecessary overhead. Should I remove [the {{maven-shade-plugin}} configuration in {{flink-connector-kinesis}} |https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330]as part of this change? This breaks the assumptions made of existing clients (essentially from 1.12.x if I want the fat jar I will have to pull the {{flink-sql-connector-kinesis}} package, even if I am not using the Table API or SQL features. > Add flink-sql-connector-kinesis package > --- > > Key: FLINK-20043 > URL: https://issues.apache.org/jira/browse/FLINK-20043 > Project: Flink > Issue Type: Improvement >Reporter: Alexander Alexandrov >Assignee: Alexander Alexandrov >Priority: Major > Attachments: third-party-report.html > > > This is a follow up item after the recent addition of [Kinesis SQL source and > sink in PR #13770 |https://github.com/apache/flink/pull/13770]. > Create a package that bundles a fat connector jar that can be used by SQL > clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] > and the related PRs for a discussion how to do that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-20043) Add flink-sql-connector-kinesis package
[ https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227979#comment-17227979 ] Alexander Alexandrov edited comment on FLINK-20043 at 11/8/20, 12:05 PM: - BTW, unlike {{flink-connector-kafka}} and {{flink-connector-elasticsearch[6|7]}} the Maven jars published by the {{flink-connector-kinesis}} package are already fat. In that sense adding a separate {{flink-sql-connector-kinesis}} package doing the same thing seems like unnecessary overhead. Should I remove [the {{maven-shade-plugin}} configuration in {{flink-connector-kinesis}} |https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330]as part of this change? This breaks the assumptions made of existing clients (essentially from 1.12.x if I want the fat jar I will have to pull the {{flink-sql-connector-kinesis}} package, even if I am not using the Table API or SQL features. was (Author: aalexandrov): BTW, unlike {{flink-connector-kafka}} and \{{flink-connector-elasticsearch[6|7]}}, the Maven jars published by the {{flink-connector-kinesis}} package are already fat. In that sense adding a separate {{flink-sql-connector-kinesis}} package doing the same thing seems like unnecessary overhead. Should I remove [the {{maven-shade-plugin}} configuration in {{flink-connector-kinesis}} |https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330]as part of this change? This breaks the assumptions made of existing clients (essentially from 1.12.x if I want the fat jar I will have to pull the {{flink-sql-connector-kinesis}} package, even if I am not using the Table API or SQL features. > Add flink-sql-connector-kinesis package > --- > > Key: FLINK-20043 > URL: https://issues.apache.org/jira/browse/FLINK-20043 > Project: Flink > Issue Type: Improvement >Reporter: Alexander Alexandrov >Assignee: Alexander Alexandrov >Priority: Major > Attachments: third-party-report.html > > > This is a follow up item after the recent addition of [Kinesis SQL source and > sink in PR #13770 |https://github.com/apache/flink/pull/13770]. > Create a package that bundles a fat connector jar that can be used by SQL > clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] > and the related PRs for a discussion how to do that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-20043) Add flink-sql-connector-kinesis package
[ https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227979#comment-17227979 ] Alexander Alexandrov edited comment on FLINK-20043 at 11/8/20, 12:05 PM: - BTW, unlike {{flink-connector-kafka}} and {{flink-connector-elasticsearch{6|7}}}, the Maven jars published by the {{flink-connector-kinesis}} package are already fat. In that sense adding a separate {{flink-sql-connector-kinesis}} package doing the same thing seems like unnecessary overhead. Should I remove [the {{maven-shade-plugin}} configuration in {{flink-connector-kinesis}} |https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330]as part of this change? This breaks the assumptions made of existing clients (essentially from 1.12.x if I want the fat jar I will have to pull the {{flink-sql-connector-kinesis}} package, even if I am not using the Table API or SQL features. was (Author: aalexandrov): BTW the 2.11 and 2.12 jars published by the {{flink-connector-kinesis}} package are already fat (they include shaded transitive AWS dependencies), so adding a separate {{flink-sql-connector-kinesis}} package that does the same seems like an unnecessary overhead. Should I remove the [maven-shade-plugin |https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330] as part of this change? > Add flink-sql-connector-kinesis package > --- > > Key: FLINK-20043 > URL: https://issues.apache.org/jira/browse/FLINK-20043 > Project: Flink > Issue Type: Improvement >Reporter: Alexander Alexandrov >Assignee: Alexander Alexandrov >Priority: Major > Attachments: third-party-report.html > > > This is a follow up item after the recent addition of [Kinesis SQL source and > sink in PR #13770 |https://github.com/apache/flink/pull/13770]. > Create a package that bundles a fat connector jar that can be used by SQL > clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] > and the related PRs for a discussion how to do that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-20043) Add flink-sql-connector-kinesis package
[ https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227979#comment-17227979 ] Alexander Alexandrov edited comment on FLINK-20043 at 11/8/20, 12:05 PM: - BTW, unlike {{flink-connector-kafka}} and \{{flink-connector-elasticsearch[6|7]}}, the Maven jars published by the {{flink-connector-kinesis}} package are already fat. In that sense adding a separate {{flink-sql-connector-kinesis}} package doing the same thing seems like unnecessary overhead. Should I remove [the {{maven-shade-plugin}} configuration in {{flink-connector-kinesis}} |https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330]as part of this change? This breaks the assumptions made of existing clients (essentially from 1.12.x if I want the fat jar I will have to pull the {{flink-sql-connector-kinesis}} package, even if I am not using the Table API or SQL features. was (Author: aalexandrov): BTW, unlike {{flink-connector-kafka}} and {{flink-connector-elasticsearch{6|7}}}, the Maven jars published by the {{flink-connector-kinesis}} package are already fat. In that sense adding a separate {{flink-sql-connector-kinesis}} package doing the same thing seems like unnecessary overhead. Should I remove [the {{maven-shade-plugin}} configuration in {{flink-connector-kinesis}} |https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330]as part of this change? This breaks the assumptions made of existing clients (essentially from 1.12.x if I want the fat jar I will have to pull the {{flink-sql-connector-kinesis}} package, even if I am not using the Table API or SQL features. > Add flink-sql-connector-kinesis package > --- > > Key: FLINK-20043 > URL: https://issues.apache.org/jira/browse/FLINK-20043 > Project: Flink > Issue Type: Improvement >Reporter: Alexander Alexandrov >Assignee: Alexander Alexandrov >Priority: Major > Attachments: third-party-report.html > > > This is a follow up item after the recent addition of [Kinesis SQL source and > sink in PR #13770 |https://github.com/apache/flink/pull/13770]. > Create a package that bundles a fat connector jar that can be used by SQL > clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] > and the related PRs for a discussion how to do that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20043) Add flink-sql-connector-kinesis package
[ https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227979#comment-17227979 ] Alexander Alexandrov commented on FLINK-20043: -- BTW the 2.11 and 2.12 jars published by the {{flink-connector-kinesis}} package are already fat (they include shaded transitive AWS dependencies), so adding a separate {{flink-sql-connector-kinesis}} package that does the same seems like an unnecessary overhead. Should I remove the [maven-shade-plugin |https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330] as part of this change? > Add flink-sql-connector-kinesis package > --- > > Key: FLINK-20043 > URL: https://issues.apache.org/jira/browse/FLINK-20043 > Project: Flink > Issue Type: Improvement >Reporter: Alexander Alexandrov >Assignee: Alexander Alexandrov >Priority: Major > Attachments: third-party-report.html > > > This is a follow up item after the recent addition of [Kinesis SQL source and > sink in PR #13770 |https://github.com/apache/flink/pull/13770]. > Create a package that bundles a fat connector jar that can be used by SQL > clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] > and the related PRs for a discussion how to do that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20043) Add flink-sql-connector-kinesis package
[ https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227969#comment-17227969 ] Alexander Alexandrov commented on FLINK-20043: -- {quote}are we allowed to ship a Kinesis fat jar license-wise? {quote} I ran the following command {code} mvn license:third-party-report -pl :flink-connector-kinesis_2.11 {code} The produced [^third-party-report.html] shows that all AWS packages have Apache 2.0 License. The ones that I am not sure about are {{javax.xml.bind:jaxb-api:2.3.1}}, {{javax.activation:javax.activation-api:1.2.0}} and {{org.javassist:javassist:3.24.0-GA}}, which have GPL variants. However, {{org.javassist:javassist:3.24.0-GA}} is also a dependency in {{flink-sql-connectors-kafka}}. I have to check whether we strictly need the first two. > Add flink-sql-connector-kinesis package > --- > > Key: FLINK-20043 > URL: https://issues.apache.org/jira/browse/FLINK-20043 > Project: Flink > Issue Type: Improvement >Reporter: Alexander Alexandrov >Assignee: Alexander Alexandrov >Priority: Major > Attachments: third-party-report.html > > > This is a follow up item after the recent addition of [Kinesis SQL source and > sink in PR #13770 |https://github.com/apache/flink/pull/13770]. > Create a package that bundles a fat connector jar that can be used by SQL > clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] > and the related PRs for a discussion how to do that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-20043) Add flink-sql-connector-kinesis package
[ https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Alexandrov updated FLINK-20043: - Attachment: third-party-report.html > Add flink-sql-connector-kinesis package > --- > > Key: FLINK-20043 > URL: https://issues.apache.org/jira/browse/FLINK-20043 > Project: Flink > Issue Type: Improvement >Reporter: Alexander Alexandrov >Assignee: Alexander Alexandrov >Priority: Major > Attachments: third-party-report.html > > > This is a follow up item after the recent addition of [Kinesis SQL source and > sink in PR #13770 |https://github.com/apache/flink/pull/13770]. > Create a package that bundles a fat connector jar that can be used by SQL > clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] > and the related PRs for a discussion how to do that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-20043) Add flink-sql-connector-kinesis package
Alexander Alexandrov created FLINK-20043: Summary: Add flink-sql-connector-kinesis package Key: FLINK-20043 URL: https://issues.apache.org/jira/browse/FLINK-20043 Project: Flink Issue Type: Improvement Reporter: Alexander Alexandrov This is a follow up item after the recent addition of [Kinesis SQL source and sink in PR #13770 |https://github.com/apache/flink/pull/13770]. Create a package that bundles a fat connector jar that can be used by SQL clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] and the related PRs for a discussion how to do that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-20042) Add end-to-end tests for Kinesis Table sources and sinks
Alexander Alexandrov created FLINK-20042: Summary: Add end-to-end tests for Kinesis Table sources and sinks Key: FLINK-20042 URL: https://issues.apache.org/jira/browse/FLINK-20042 Project: Flink Issue Type: Test Components: Connectors / Kinesis, Table SQL / Ecosystem Reporter: Alexander Alexandrov Follow-up issue to add end-to-end tests for the recently added {{KinesisDynamicSource}} and {{KinesisDynamicSink}}. See [the discussion in PR #13770| https://github.com/apache/flink/pull/13770] for details. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-18858) Kinesis Flink SQL Connector
[ https://issues.apache.org/jira/browse/FLINK-18858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214786#comment-17214786 ] Alexander Alexandrov commented on FLINK-18858: -- Hello [~rmetzger], I picked this up from [~danny.cranmer], could you please re-assign this to me? If everything goes smoothly I should be able to open a PR early next week. > Kinesis Flink SQL Connector > --- > > Key: FLINK-18858 > URL: https://issues.apache.org/jira/browse/FLINK-18858 > Project: Flink > Issue Type: Improvement > Components: Connectors / Kinesis, Table SQL / Ecosystem >Reporter: Waldemar Hummer >Assignee: Danny Cranmer >Priority: Major > > Hi all, > as far as I can see in the [list of > connectors|https://github.com/apache/flink/tree/master/flink-connectors], we > have a > {{[flink-connector-kinesis|https://github.com/apache/flink/tree/master/flink-connectors/flink-connector-kinesis]}} > for *programmatic access* to Kinesis streams, but there does not yet seem to > exist a *Kinesis SQL connector* (something like > {{flink-sql-connector-kinesis}}, analogous to {{flink-sql-connector-kafka}}). > Our use case would be to enable SQL queries with direct access to Kinesis > sources (and potentially sinks), to enable something like the following Flink > SQL queries: > {code:java} > $ bin/sql-client.sh embedded > ... > Flink SQL> CREATE TABLE Orders(`user` string, amount int, rowtime TIME) WITH > ('connector' = 'kinesis', ...); > ... > Flink SQL> SELECT * FROM Orders ...; > ...{code} > > I was wondering if this is something that has been considered, or is already > actively being worked on? If one of you can provide some guidance, we may be > able to work on a PoC implementation to add this functionality. > > (Wasn't able to find an existing issue in the backlog - if this is a > duplicate, then please let me know as well.) > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (FLINK-4783) Allow to register TypeInfoFactories manually
[ https://issues.apache.org/jira/browse/FLINK-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Alexandrov reassigned FLINK-4783: --- Assignee: Alexander Alexandrov > Allow to register TypeInfoFactories manually > > > Key: FLINK-4783 > URL: https://issues.apache.org/jira/browse/FLINK-4783 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Affects Versions: 1.2.0 >Reporter: Till Rohrmann >Assignee: Alexander Alexandrov >Priority: Minor > Fix For: 1.2.0 > > > The newly introduced {{TypeInfoFactories}} (FLINK-3042 and FLINK-3060) allow > to create {{TypeInformations}} for types which are annotated with > {{TypeInfo}}. This is useful if the user has control over the type for which > he wants to generate the {{TypeInformation}}. > However, annotating a type is not always possible if the type comes from an > external library. In this case, it would be good to be able to directly > register a {{TypeInfoFactory}} without having to annotate the type. > The {{TypeExtractor#registerFactory}} already has such a method. However, it > is declared private. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-4783) Allow to register TypeInfoFactories manually
[ https://issues.apache.org/jira/browse/FLINK-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562288#comment-15562288 ] Alexander Alexandrov edited comment on FLINK-4783 at 10/10/16 1:18 PM: --- Alright, I guess keeping {{TypeExtractor#registerFactory}} static makes sense, but it still does not solve the problem with types from an external library. Maybe delegating through {{ExecutionConfig}} as suggested by [~till.rohrmann] is a cleaner way that will provide means to throw an exception if the registration at a wrong position in the code. This will also be on par with the similar method which already exists for registering Kryo serializers. {code:java} env.getConfig().registerTypeInfoFactory(Class type, Class> factory) {code} was (Author: aalexandrov): Alright, I guess keeping {{TypeExtractor#registerFactory}} static makes sense, but it still does not solve the problem with types from an external library. Maybe delegating through {{ExecutionConfig}} as suggested by [~till.rohrmann] is a cleaner way that will provide means to throw an exception if the registration at a wrong position in the code. This will also be on par with the similar method which already exists for registering Kryo serializers. {code:scala} env.getConfig().registerTypeInfoFactory(Class type, Class> factory) {code:scala} > Allow to register TypeInfoFactories manually > > > Key: FLINK-4783 > URL: https://issues.apache.org/jira/browse/FLINK-4783 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Affects Versions: 1.2.0 >Reporter: Till Rohrmann >Priority: Minor > Fix For: 1.2.0 > > > The newly introduced {{TypeInfoFactories}} (FLINK-3042 and FLINK-3060) allow > to create {{TypeInformations}} for types which are annotated with > {{TypeInfo}}. This is useful if the user has control over the type for which > he wants to generate the {{TypeInformation}}. > However, annotating a type is not always possible if the type comes from an > external library. In this case, it would be good to be able to directly > register a {{TypeInfoFactory}} without having to annotate the type. > The {{TypeExtractor#registerFactory}} already has such a method. However, it > is declared private. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4783) Allow to register TypeInfoFactories manually
[ https://issues.apache.org/jira/browse/FLINK-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562288#comment-15562288 ] Alexander Alexandrov commented on FLINK-4783: - Alright, I guess keeping {{TypeExtractor#registerFactory}} static makes sense, but it still does not solve the problem with types from an external library. Maybe delegating through {{ExecutionConfig}} as suggested by [~till.rohrmann] is a cleaner way that will provide means to throw an exception if the registration at a wrong position in the code. This will also be on par with the similar method which already exists for registering Kryo serializers. {code:scala} env.getConfig().registerTypeInfoFactory(Class type, Class> factory) {code:scala} > Allow to register TypeInfoFactories manually > > > Key: FLINK-4783 > URL: https://issues.apache.org/jira/browse/FLINK-4783 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Affects Versions: 1.2.0 >Reporter: Till Rohrmann >Priority: Minor > Fix For: 1.2.0 > > > The newly introduced {{TypeInfoFactories}} (FLINK-3042 and FLINK-3060) allow > to create {{TypeInformations}} for types which are annotated with > {{TypeInfo}}. This is useful if the user has control over the type for which > he wants to generate the {{TypeInformation}}. > However, annotating a type is not always possible if the type comes from an > external library. In this case, it would be good to be able to directly > register a {{TypeInfoFactory}} without having to annotate the type. > The {{TypeExtractor#registerFactory}} already has such a method. However, it > is declared private. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2858) Cannot build Flink Scala 2.11 with IntelliJ
[ https://issues.apache.org/jira/browse/FLINK-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961846#comment-14961846 ] Alexander Alexandrov commented on FLINK-2858: - I like the idea for a marker-based activation suggested here: http://stackoverflow.com/a/8391313 What do you think? > Cannot build Flink Scala 2.11 with IntelliJ > --- > > Key: FLINK-2858 > URL: https://issues.apache.org/jira/browse/FLINK-2858 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 0.10 >Reporter: Till Rohrmann > > If I activate the scala-2.11 profile from within IntelliJ (and thus > deactivate the scala-2.10 profile) in order to build Flink with Scala 2.11, > then Flink cannot be built. The problem is that some Scala macros cannot be > expanded because they were compiled with the wrong version (I assume 2.10). > This makes debugging tests with Scala 2.11 in IntelliJ impossible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-2858) Cannot build Flink Scala 2.11 with IntelliJ
[ https://issues.apache.org/jira/browse/FLINK-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959689#comment-14959689 ] Alexander Alexandrov edited comment on FLINK-2858 at 10/15/15 10:29 PM: [~trohrm...@apache.org] After trying around for a while I managed to overcome the issue. Here are the required steps * Run the `tools/change-scala-version.sh` script * Run `mvn clean` from the console * Optionally, forcefully activate the `scala-2.11` profile & deactivate `scala-2.10` from the IntelliJ Maven panel. Strictly speeking this is not needed because after running the script the default profile is changed to `scala-2.11`. * In IntelliJ, click "Generate Sources and Update Folders for all Projects" in the same panel (but maybe a simple "Reimport all Maven projects" will be enough) * In IntelliJ, run "Build > Rebuild Project" This worked for me on the latest IntelliJ Ultimate (14.1.5). was (Author: aalexandrov): [~trohrm...@apache.org] After trying around for a while I managed to overcome the issue. Here are the steps * Run the `tools/change-scala-version.sh` script * Run `mvn clean` from the console * Optionally, forcefully activate the `scala-2.11` profile & deactivate `scala-2.10` from the IntelliJ Maven panel. Strictly speeking this is not needed because after running the script the default profile is changed to `scala-2.11`. * In IntelliJ, click "Generate Sources and Update Folders for all Projects" in the same panel (but maybe a simple "Reimport all Maven projects" will be enough) * In IntelliJ, run "Build > Rebuild Project" This worked for me on the latest IntelliJ Ultimate (14.1.5). > Cannot build Flink Scala 2.11 with IntelliJ > --- > > Key: FLINK-2858 > URL: https://issues.apache.org/jira/browse/FLINK-2858 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 0.10 >Reporter: Till Rohrmann > > If I activate the scala-2.11 profile from within IntelliJ (and thus > deactivate the scala-2.10 profile) in order to build Flink with Scala 2.11, > then Flink cannot be built. The problem is that some Scala macros cannot be > expanded because they were compiled with the wrong version (I assume 2.10). > This makes debugging tests with Scala 2.11 in IntelliJ impossible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-2858) Cannot build Flink Scala 2.11 with IntelliJ
[ https://issues.apache.org/jira/browse/FLINK-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959689#comment-14959689 ] Alexander Alexandrov edited comment on FLINK-2858 at 10/15/15 10:28 PM: [~trohrm...@apache.org] After trying around for a big I managed to overcome the issue. Here are the steps * Run the `tools/change-scala-version.sh` script * Run `mvn clean` from the console * Optionally, forcefully activate the `scala-2.11` profile & deactivate `scala-2.10` from the IntelliJ Maven panel. Strictly speeking this is not needed because after running the script the default profile is changed to `scala-2.11`. * In IntelliJ, click "Generate Sources and Update Folders for all Projects" in the same panel (but maybe a simple "Reimport all Maven projects" will be enough) * In IntelliJ, run "Build > Rebuild Project" This worked for me on the latest IntelliJ Ultimate (14.1.5). was (Author: aalexandrov): [~trohrm...@apache.org] After trying around for a big I managed to overcome the issue. Here are the steps * Run the `tools/change-scala-version.sh` script * Run `mvn clean` from the console * Optionally, forcefully activate the `scala-2.11` profile & deactivate `scala-2.10` from the IntelliJ Maven panel. Strictly speeking this is not needed because after running the script the default profile is changed to `scala-2.11`. * Click "Generate Sources and Update Folders for all Projects" in the same panel (but maybe a simple "Reimport all Maven projects" will be enough) * Run "Build > Rebuild Project" This worked for me on the latest IntelliJ Ultimate (14.1.5). > Cannot build Flink Scala 2.11 with IntelliJ > --- > > Key: FLINK-2858 > URL: https://issues.apache.org/jira/browse/FLINK-2858 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 0.10 >Reporter: Till Rohrmann > > If I activate the scala-2.11 profile from within IntelliJ (and thus > deactivate the scala-2.10 profile) in order to build Flink with Scala 2.11, > then Flink cannot be built. The problem is that some Scala macros cannot be > expanded because they were compiled with the wrong version (I assume 2.10). > This makes debugging tests with Scala 2.11 in IntelliJ impossible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2858) Cannot build Flink Scala 2.11 with IntelliJ
[ https://issues.apache.org/jira/browse/FLINK-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959750#comment-14959750 ] Alexander Alexandrov commented on FLINK-2858: - I think that the current build description in the docs does not properly reflect what needs to be done. All you need to do is run the `tools/change-scala-version.sh` script, the default Scala profile is then automatically changed. I tried to update this in [PR 1260|https://github.com/apache/flink/pull/1260]. > Cannot build Flink Scala 2.11 with IntelliJ > --- > > Key: FLINK-2858 > URL: https://issues.apache.org/jira/browse/FLINK-2858 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 0.10 >Reporter: Till Rohrmann > > If I activate the scala-2.11 profile from within IntelliJ (and thus > deactivate the scala-2.10 profile) in order to build Flink with Scala 2.11, > then Flink cannot be built. The problem is that some Scala macros cannot be > expanded because they were compiled with the wrong version (I assume 2.10). > This makes debugging tests with Scala 2.11 in IntelliJ impossible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2858) Cannot build Flink Scala 2.11 with IntelliJ
[ https://issues.apache.org/jira/browse/FLINK-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959689#comment-14959689 ] Alexander Alexandrov commented on FLINK-2858: - [~trohrm...@apache.org] After trying around for a big I managed to overcome the issue. Here are the steps * Run the `tools/change-scala-version.sh` script * Run `mvn clean` from the console * Activate the `scala-2.11` profile & deactivate `scala-2.10` from the IntelliJ Maven panel * Click "Generate Sources and Update Folders for all Projects" in the same panel (but maybe a simple "Reimport all Maven projects" will be enough) * Run "Build > Rebuild Project" This worked for me on the latest IntelliJ Ultimate (14.1.5). > Cannot build Flink Scala 2.11 with IntelliJ > --- > > Key: FLINK-2858 > URL: https://issues.apache.org/jira/browse/FLINK-2858 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 0.10 >Reporter: Till Rohrmann > > If I activate the scala-2.11 profile from within IntelliJ (and thus > deactivate the scala-2.10 profile) in order to build Flink with Scala 2.11, > then Flink cannot be built. The problem is that some Scala macros cannot be > expanded because they were compiled with the wrong version (I assume 2.10). > This makes debugging tests with Scala 2.11 in IntelliJ impossible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-2858) Cannot build Flink Scala 2.11 with IntelliJ
[ https://issues.apache.org/jira/browse/FLINK-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959689#comment-14959689 ] Alexander Alexandrov edited comment on FLINK-2858 at 10/15/15 10:27 PM: [~trohrm...@apache.org] After trying around for a big I managed to overcome the issue. Here are the steps * Run the `tools/change-scala-version.sh` script * Run `mvn clean` from the console * Optionally, forcefully activate the `scala-2.11` profile & deactivate `scala-2.10` from the IntelliJ Maven panel. Strictly speeking this is not needed because after running the script the default profile is changed to `scala-2.11`. * Click "Generate Sources and Update Folders for all Projects" in the same panel (but maybe a simple "Reimport all Maven projects" will be enough) * Run "Build > Rebuild Project" This worked for me on the latest IntelliJ Ultimate (14.1.5). was (Author: aalexandrov): [~trohrm...@apache.org] After trying around for a big I managed to overcome the issue. Here are the steps * Run the `tools/change-scala-version.sh` script * Run `mvn clean` from the console * Activate the `scala-2.11` profile & deactivate `scala-2.10` from the IntelliJ Maven panel * Click "Generate Sources and Update Folders for all Projects" in the same panel (but maybe a simple "Reimport all Maven projects" will be enough) * Run "Build > Rebuild Project" This worked for me on the latest IntelliJ Ultimate (14.1.5). > Cannot build Flink Scala 2.11 with IntelliJ > --- > > Key: FLINK-2858 > URL: https://issues.apache.org/jira/browse/FLINK-2858 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 0.10 >Reporter: Till Rohrmann > > If I activate the scala-2.11 profile from within IntelliJ (and thus > deactivate the scala-2.10 profile) in order to build Flink with Scala 2.11, > then Flink cannot be built. The problem is that some Scala macros cannot be > expanded because they were compiled with the wrong version (I assume 2.10). > This makes debugging tests with Scala 2.11 in IntelliJ impossible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-2858) Cannot build Flink Scala 2.11 with IntelliJ
[ https://issues.apache.org/jira/browse/FLINK-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959689#comment-14959689 ] Alexander Alexandrov edited comment on FLINK-2858 at 10/15/15 10:30 PM: [~trohrm...@apache.org] After trying around for a while I managed to overcome the issue. Here are the required steps * Run the {{tools/change-scala-version.sh}} script * Run {{mvn clean}} from the console * Optionally, forcefully activate the {{scala-2.11}} profile & deactivate {{scala-2.10}} from the IntelliJ Maven panel. Strictly speeking this is not needed because after running the script the default profile is changed to {{scala-2.11}}. * In IntelliJ, click "Generate Sources and Update Folders for all Projects" in the same panel (but maybe a simple "Reimport all Maven projects" will be enough) * In IntelliJ, run "Build > Rebuild Project" This worked for me on the latest IntelliJ Ultimate (14.1.5). was (Author: aalexandrov): [~trohrm...@apache.org] After trying around for a while I managed to overcome the issue. Here are the required steps * Run the `tools/change-scala-version.sh` script * Run `mvn clean` from the console * Optionally, forcefully activate the `scala-2.11` profile & deactivate `scala-2.10` from the IntelliJ Maven panel. Strictly speeking this is not needed because after running the script the default profile is changed to `scala-2.11`. * In IntelliJ, click "Generate Sources and Update Folders for all Projects" in the same panel (but maybe a simple "Reimport all Maven projects" will be enough) * In IntelliJ, run "Build > Rebuild Project" This worked for me on the latest IntelliJ Ultimate (14.1.5). > Cannot build Flink Scala 2.11 with IntelliJ > --- > > Key: FLINK-2858 > URL: https://issues.apache.org/jira/browse/FLINK-2858 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 0.10 >Reporter: Till Rohrmann > > If I activate the scala-2.11 profile from within IntelliJ (and thus > deactivate the scala-2.10 profile) in order to build Flink with Scala 2.11, > then Flink cannot be built. The problem is that some Scala macros cannot be > expanded because they were compiled with the wrong version (I assume 2.10). > This makes debugging tests with Scala 2.11 in IntelliJ impossible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-2858) Cannot build Flink Scala 2.11 with IntelliJ
[ https://issues.apache.org/jira/browse/FLINK-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959689#comment-14959689 ] Alexander Alexandrov edited comment on FLINK-2858 at 10/15/15 10:29 PM: [~trohrm...@apache.org] After trying around for a while I managed to overcome the issue. Here are the steps * Run the `tools/change-scala-version.sh` script * Run `mvn clean` from the console * Optionally, forcefully activate the `scala-2.11` profile & deactivate `scala-2.10` from the IntelliJ Maven panel. Strictly speeking this is not needed because after running the script the default profile is changed to `scala-2.11`. * In IntelliJ, click "Generate Sources and Update Folders for all Projects" in the same panel (but maybe a simple "Reimport all Maven projects" will be enough) * In IntelliJ, run "Build > Rebuild Project" This worked for me on the latest IntelliJ Ultimate (14.1.5). was (Author: aalexandrov): [~trohrm...@apache.org] After trying around for a big I managed to overcome the issue. Here are the steps * Run the `tools/change-scala-version.sh` script * Run `mvn clean` from the console * Optionally, forcefully activate the `scala-2.11` profile & deactivate `scala-2.10` from the IntelliJ Maven panel. Strictly speeking this is not needed because after running the script the default profile is changed to `scala-2.11`. * In IntelliJ, click "Generate Sources and Update Folders for all Projects" in the same panel (but maybe a simple "Reimport all Maven projects" will be enough) * In IntelliJ, run "Build > Rebuild Project" This worked for me on the latest IntelliJ Ultimate (14.1.5). > Cannot build Flink Scala 2.11 with IntelliJ > --- > > Key: FLINK-2858 > URL: https://issues.apache.org/jira/browse/FLINK-2858 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 0.10 >Reporter: Till Rohrmann > > If I activate the scala-2.11 profile from within IntelliJ (and thus > deactivate the scala-2.10 profile) in order to build Flink with Scala 2.11, > then Flink cannot be built. The problem is that some Scala macros cannot be > expanded because they were compiled with the wrong version (I assume 2.10). > This makes debugging tests with Scala 2.11 in IntelliJ impossible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2858) Cannot build Flink Scala 2.11 with IntelliJ
[ https://issues.apache.org/jira/browse/FLINK-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959761#comment-14959761 ] Alexander Alexandrov commented on FLINK-2858: - I think that it also makes sense to place a comment on the scala-2.11 profiles which indicate that these are not meant to be overridden by the user. We can also try to pull the {{scala.binary.version}} property out to the general {{properties}} section and rework the profiles so they activate themselves based on the current {{scala.binary.version}} value, but AFAIR [~rmetzger] mentioned once that property-based activation is not really supported by Maven at the moment. > Cannot build Flink Scala 2.11 with IntelliJ > --- > > Key: FLINK-2858 > URL: https://issues.apache.org/jira/browse/FLINK-2858 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 0.10 >Reporter: Till Rohrmann > > If I activate the scala-2.11 profile from within IntelliJ (and thus > deactivate the scala-2.10 profile) in order to build Flink with Scala 2.11, > then Flink cannot be built. The problem is that some Scala macros cannot be > expanded because they were compiled with the wrong version (I assume 2.10). > This makes debugging tests with Scala 2.11 in IntelliJ impossible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2311) Set flink-* dependencies in flink-contrib as provided
[ https://issues.apache.org/jira/browse/FLINK-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Alexandrov updated FLINK-2311: External issue URL: https://github.com/apache/flink/pull/880 External issue ID: 880 Set flink-* dependencies in flink-contrib as provided --- Key: FLINK-2311 URL: https://issues.apache.org/jira/browse/FLINK-2311 Project: Flink Issue Type: Improvement Components: flink-contrib Affects Versions: 0.10, 0.9.1 Reporter: Alexander Alexandrov Assignee: Alexander Alexandrov Priority: Minor Labels: easyfix, maven, patch Fix For: 0.10, 0.9.1 The {{flink-contrib}} folder is assumed to be provided by the user. As such, other {{flink-*}} dependencies referenced within {{flink-contrib}} should be set as _'provided'_ in order to keep the size of the user jars down. I'm currently testing a patch that changes the poms as suggested and will open a PR on GitHub if this everything passes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2311) Set flink-* dependencies in flink-contrib as provided
Alexander Alexandrov created FLINK-2311: --- Summary: Set flink-* dependencies in flink-contrib as provided Key: FLINK-2311 URL: https://issues.apache.org/jira/browse/FLINK-2311 Project: Flink Issue Type: Improvement Components: flink-contrib Affects Versions: 0.10, 0.9.1 Reporter: Alexander Alexandrov Assignee: Alexander Alexandrov Priority: Minor Fix For: 0.10, 0.9.1 The {{flink-contrib}} folder is assumed to be provided by the user. As such, other {{flink-*}} dependencies referenced within {{flink-contrib}} should be set as _'provided'_ in order to keep the size of the user jars down. I'm currently testing a patch that changes the poms as suggested and will open a PR on GitHub if this everything passes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2237) Add hash-based Aggregation
[ https://issues.apache.org/jira/browse/FLINK-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590727#comment-14590727 ] Alexander Alexandrov commented on FLINK-2237: - In the Flink RT, operators are represented in the `PactDriver` family. I suggest to proceed as follows * read the code of an existing aggregate driver implementation (e.g. the `GroupReduceCombineDriver`); * use this as a starting point: create a copy of the class, and adapt it as a a first sketch of your hash-based driver; * implement a test for your driver (you can use the corresponding `GroupReduceCombineDriver` test); * push the sketch in a repository and summarize your attempt here. We can then go over it and give you more concrete feedback. Add hash-based Aggregation -- Key: FLINK-2237 URL: https://issues.apache.org/jira/browse/FLINK-2237 Project: Flink Issue Type: New Feature Reporter: Rafiullah Momand Priority: Minor Labels: github-import Fix For: pre-apache Aggregation functions at the moment are implemented in a sort-based way. How can we implement hash based Aggregation for Flink? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-2231) Create a Serializer for Scala Enumerations
[ https://issues.apache.org/jira/browse/FLINK-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Alexandrov reassigned FLINK-2231: --- Assignee: Alexander Alexandrov Create a Serializer for Scala Enumerations -- Key: FLINK-2231 URL: https://issues.apache.org/jira/browse/FLINK-2231 Project: Flink Issue Type: Improvement Components: Scala API Reporter: Stephan Ewen Assignee: Alexander Alexandrov Scala Enumerations are currently serialized with Kryo, but should be efficiently serialized by just writing the {{initial}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2231) Create a Serializer for Scala Enumerations
[ https://issues.apache.org/jira/browse/FLINK-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589814#comment-14589814 ] Alexander Alexandrov commented on FLINK-2231: - If the point of the ticket is to extend the TypeInformation synthesis macro I think I can handle that these days. [~StephanEwen] what do you mean by initial? Create a Serializer for Scala Enumerations -- Key: FLINK-2231 URL: https://issues.apache.org/jira/browse/FLINK-2231 Project: Flink Issue Type: Improvement Components: Scala API Reporter: Stephan Ewen Assignee: Alexander Alexandrov Scala Enumerations are currently serialized with Kryo, but should be efficiently serialized by just writing the {{initial}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566564#comment-14566564 ] Alexander Alexandrov commented on FLINK-1731: - [~till.rohrmann] I coudn't find it eather. I think we were discussing to do the K-Means|| as a separate issue. Florian Gößler also reported the following issue when he tried to rebase {{{ Error:(200, 75) ambiguous implicit values: both value denseVectorConverter in object BreezeVectorConverter of type = org.apache.flink.ml.math.BreezeVectorConverter[org.apache.flink.ml.math.DenseVector] and value sparseVectorConverter in object BreezeVectorConverter of type = org.apache.flink.ml.math.BreezeVectorConverter[org.apache.flink.ml.math.SparseVector] match expected type org.apache.flink.ml.math.BreezeVectorConverter[T] .reduce((p1, p2) = (p1._1, (p1._2.asBreeze + p2._2.asBreeze).fromBreeze, p1._3 + p2._3)) }}} Any idea what might be the cause? Add kMeans clustering algorithm to machine learning library --- Key: FLINK-1731 URL: https://issues.apache.org/jira/browse/FLINK-1731 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Peter Schrott Labels: ML The Flink repository already contains a kMeans implementation but it is not yet ported to the machine learning library. I assume that only the used data types have to be adapted and then it can be more or less directly moved to flink-ml. The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better implementation because the improve the initial seeding phase to achieve near optimal clustering. It might be worthwhile to implement kMeans||. Resources: [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552113#comment-14552113 ] Alexander Alexandrov commented on FLINK-1731: - [~till.rohrmann] Can we merge this with the current set of features and then add the automatic picking of the initial centroids in another issue? Add kMeans clustering algorithm to machine learning library --- Key: FLINK-1731 URL: https://issues.apache.org/jira/browse/FLINK-1731 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Peter Schrott Labels: ML The Flink repository already contains a kMeans implementation but it is not yet ported to the machine learning library. I assume that only the used data types have to be adapted and then it can be more or less directly moved to flink-ml. The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better implementation because the improve the initial seeding phase to achieve near optimal clustering. It might be worthwhile to implement kMeans||. Resources: [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2044) Implementation of Gelly Algorithm HITS
[ https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550568#comment-14550568 ] Alexander Alexandrov commented on FLINK-2044: - The [feature branch can be found here|https://github.com/JavidMayar/flink/commits/HITS]. Implementation of Gelly Algorithm HITS -- Key: FLINK-2044 URL: https://issues.apache.org/jira/browse/FLINK-2044 Project: Flink Issue Type: Improvement Components: Gelly Reporter: Ahamd Javid Priority: Minor Attachments: HitsMain.java, Hits_Class.java Implementation of Hits Algorithm in Gelly API using Java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2023) TypeExtractor does not work for (some) Scala Classes
[ https://issues.apache.org/jira/browse/FLINK-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547706#comment-14547706 ] Alexander Alexandrov commented on FLINK-2023: - For a lot of operators, Flink's DataSet API provides overloaded method signatures where a user can explicitly pass a TypeInformation object. If this is not given, Flink falls back to the TypeExtractor logic. If you follow a similar pattern in Gelly, you can 1. wrap the Java API in Scala; 2. synthesize the TypeInformation using the already present Scala macros; 3. pass the synthesized object using the explicit signatures in Gelly. As a pre-requisite for that, the TypeExtraction logic should be extended in order to work with Scala base types. TypeExtractor does not work for (some) Scala Classes Key: FLINK-2023 URL: https://issues.apache.org/jira/browse/FLINK-2023 Project: Flink Issue Type: Bug Components: Type Serialization System Reporter: Aljoscha Krettek [~vanaepi] discovered some problems while working on the Scala Gelly API where, for example, a Scala MapFunction can not be correctly analyzed by the type extractor. For example, generic types will not be correctly detected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2023) TypeExtractor does not work for (some) Scala Classes
[ https://issues.apache.org/jira/browse/FLINK-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547764#comment-14547764 ] Alexander Alexandrov commented on FLINK-2023: - This was my basic idea, yes. In the example you have given things could be probably simplified a bit, as type info related to the input are already properly set. In the example above, you just need to pass the {{returnType}}. TypeExtractor does not work for (some) Scala Classes Key: FLINK-2023 URL: https://issues.apache.org/jira/browse/FLINK-2023 Project: Flink Issue Type: Bug Components: Type Serialization System Reporter: Aljoscha Krettek [~vanaepi] discovered some problems while working on the Scala Gelly API where, for example, a Scala MapFunction can not be correctly analyzed by the type extractor. For example, generic types will not be correctly detected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2023) TypeExtractor does not work for (some) Scala Classes
[ https://issues.apache.org/jira/browse/FLINK-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547105#comment-14547105 ] Alexander Alexandrov commented on FLINK-2023: - I suggest to have a thin layer which implicitly provides the TypeInformations over the Gelly Java API in Scala (similar to what [~aljoscha] has done with the DataSet Scala API). TypeExtractor does not work for (some) Scala Classes Key: FLINK-2023 URL: https://issues.apache.org/jira/browse/FLINK-2023 Project: Flink Issue Type: Bug Components: Type Serialization System Reporter: Aljoscha Krettek [~vanaepi] discovered some problems while working on the Scala Gelly API where, for example, a Scala MapFunction can not be correctly analyzed by the type extractor. For example, generic types will not be correctly detected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2023) TypeExtractor does not work for (some) Scala Classes
[ https://issues.apache.org/jira/browse/FLINK-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546969#comment-14546969 ] Alexander Alexandrov commented on FLINK-2023: - I think there is no easy way around this. Generic parameters in Scala classes are erased and at the Java level (on which the TypeExtractor operates) you will see only {{java.lang.Object}} instances instead. For a more detailed description of the problem with some examples check [this Stackoverflow entry|http://stackoverflow.com/questions/20749536/why-dont-scala-primitives-show-up-as-type-parameters-in-java-reflection]. TypeExtractor does not work for (some) Scala Classes Key: FLINK-2023 URL: https://issues.apache.org/jira/browse/FLINK-2023 Project: Flink Issue Type: Bug Components: Type Serialization System Reporter: Aljoscha Krettek [~vanaepi] discovered some problems while working on the Scala Gelly API where, for example, a Scala MapFunction can not be correctly analyzed by the type extractor. For example, generic types will not be correctly detected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2023) TypeExtractor does not work for (some) Scala Classes
[ https://issues.apache.org/jira/browse/FLINK-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546970#comment-14546970 ] Alexander Alexandrov commented on FLINK-2023: - Long story short - you cannot use the Java API directly from Scala (i.e., with Scala types) and rely on correct type inference form the TypeExtractor. TypeExtractor does not work for (some) Scala Classes Key: FLINK-2023 URL: https://issues.apache.org/jira/browse/FLINK-2023 Project: Flink Issue Type: Bug Components: Type Serialization System Reporter: Aljoscha Krettek [~vanaepi] discovered some problems while working on the Scala Gelly API where, for example, a Scala MapFunction can not be correctly analyzed by the type extractor. For example, generic types will not be correctly detected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543734#comment-14543734 ] Alexander Alexandrov commented on FLINK-1731: - I would go with a {{DataSet}} for the centroids as well. That said, we can reduce syntax at the client side by providing either - an implicit converter that {{Seq\[A\] = DataSet\[A\]}} (needs to be part of the Flink Scala API, could be already there), or - an overloaded {{setCentroids(Seq\[A\])}} setter. Add kMeans clustering algorithm to machine learning library --- Key: FLINK-1731 URL: https://issues.apache.org/jira/browse/FLINK-1731 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Peter Schrott Labels: ML The Flink repository already contains a kMeans implementation but it is not yet ported to the machine learning library. I assume that only the used data types have to be adapted and then it can be more or less directly moved to flink-ml. The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better implementation because the improve the initial seeding phase to achieve near optimal clustering. It might be worthwhile to implement kMeans||. Resources: [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1743) Add multinomial logistic regression to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Alexandrov updated FLINK-1743: Assignee: (was: Alexander Alexandrov) Add multinomial logistic regression to machine learning library --- Key: FLINK-1743 URL: https://issues.apache.org/jira/browse/FLINK-1743 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Till Rohrmann Labels: ML Multinomial logistic regression [1] would be good first classification algorithm which can classify multiple classes. Resources: [1] [http://en.wikipedia.org/wiki/Multinomial_logistic_regression] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1731) Add kMeans clustering algorithm to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Alexandrov updated FLINK-1731: Assignee: (was: Alexander Alexandrov) Add kMeans clustering algorithm to machine learning library --- Key: FLINK-1731 URL: https://issues.apache.org/jira/browse/FLINK-1731 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Till Rohrmann Labels: ML The Flink repository already contains a kMeans implementation but it is not yet ported to the machine learning library. I assume that only the used data types have to be adapted and then it can be more or less directly moved to flink-ml. The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better implementation because the improve the initial seeding phase to achieve near optimal clustering. It might be worthwhile to implement kMeans||. Resources: [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543643#comment-14543643 ] Alexander Alexandrov commented on FLINK-1731: - [~peedeeX21] for some reason I cannot assign this to you directly. I cleared the assignee field so you can assign the issue to yourself. Add kMeans clustering algorithm to machine learning library --- Key: FLINK-1731 URL: https://issues.apache.org/jira/browse/FLINK-1731 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Till Rohrmann Labels: ML The Flink repository already contains a kMeans implementation but it is not yet ported to the machine learning library. I assume that only the used data types have to be adapted and then it can be more or less directly moved to flink-ml. The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better implementation because the improve the initial seeding phase to achieve near optimal clustering. It might be worthwhile to implement kMeans||. Resources: [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-1731) Add kMeans clustering algorithm to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543734#comment-14543734 ] Alexander Alexandrov edited comment on FLINK-1731 at 5/14/15 2:35 PM: -- I would go with a {{DataSet}} for the centroids as well. That said, we can reduce syntax at the client side by providing either - an overloaded {{setCentroids(Seq\[A\])}} setter, or - an implicit converter of type {{Seq\[A\] = DataSet\[A\]}} (needs to be part of the Flink Scala API, could be already there) which allows to pass a {{Seq\[A\]}} argument to a {{setCentroids(DataSet\[A\])}} setter. was (Author: aalexandrov): I would go with a {{DataSet}} for the centroids as well. That said, we can reduce syntax at the client side by providing either - an implicit converter that {{Seq\[A\] = DataSet\[A\]}} (needs to be part of the Flink Scala API, could be already there), or - an overloaded {{setCentroids(Seq\[A\])}} setter. Add kMeans clustering algorithm to machine learning library --- Key: FLINK-1731 URL: https://issues.apache.org/jira/browse/FLINK-1731 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Peter Schrott Labels: ML The Flink repository already contains a kMeans implementation but it is not yet ported to the machine learning library. I assume that only the used data types have to be adapted and then it can be more or less directly moved to flink-ml. The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better implementation because the improve the initial seeding phase to achieve near optimal clustering. It might be worthwhile to implement kMeans||. Resources: [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541570#comment-14541570 ] Alexander Alexandrov commented on FLINK-1731: - I suggest to try and add the initial centroids as a proper parameter and not as part of the ParameterMap, since they are an actual input to the algorithm (as opposed to an algorithm hyper-parameter). Add kMeans clustering algorithm to machine learning library --- Key: FLINK-1731 URL: https://issues.apache.org/jira/browse/FLINK-1731 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Alexander Alexandrov Labels: ML The Flink repository already contains a kMeans implementation but it is not yet ported to the machine learning library. I assume that only the used data types have to be adapted and then it can be more or less directly moved to flink-ml. The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better implementation because the improve the initial seeding phase to achieve near optimal clustering. It might be worthwhile to implement kMeans||. Resources: [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1959) Accumulators BROKEN after Partitioning
[ https://issues.apache.org/jira/browse/FLINK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540146#comment-14540146 ] Alexander Alexandrov commented on FLINK-1959: - After some more debugging with [~elbehery] we managed to identify the issue. Adding {{partitionByHash}} into the pipeline breaks the local chain two and creates two tasks. The second task (the receiver of the partition) starts with a RegularPactTask which runs a NoOpDriver and does not have a UDF stub. Because of that, the test at [line 511|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/operators/RegularPactTask.java#L511] and the accumulators are never called. @[~StephanEwen]: Unless you see a problem with that solution, I suggest to remove the {{stub != null}} check at [line 511|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/operators/RegularPactTask.java#L511] and report accumulators always. Regards, Alexander Accumulators BROKEN after Partitioning -- Key: FLINK-1959 URL: https://issues.apache.org/jira/browse/FLINK-1959 Project: Flink Issue Type: Bug Components: Examples Affects Versions: master Reporter: mustafa elbehery Priority: Critical Fix For: master while running the Accumulator example in https://github.com/Elbehery/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/EmptyFieldsCountAccumulator.java, I tried to alter the data flow with PartitionByHash function before applying Filter, and the resulted accumulator was NULL. By Debugging, I could see the accumulator in the RunTime Map. However, by retrieving the accumulator from the JobExecutionResult object, it was NULL. The line caused the problem is file.partitionByHash(1).filter(new EmptyFieldFilter()) instead of file.filter(new EmptyFieldFilter()) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1999) TF-IDF transformer
[ https://issues.apache.org/jira/browse/FLINK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540156#comment-14540156 ] Alexander Alexandrov commented on FLINK-1999: - [~vsldimov] K is the Document ID, not a workd from the document. The `SparseVector` contains the transformed bag of words model for the document (originally stored in the point value of type Seq[String] in the input). TF-IDF transformer -- Key: FLINK-1999 URL: https://issues.apache.org/jira/browse/FLINK-1999 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Ronny Bräunlich Assignee: Alexander Alexandrov Priority: Minor Labels: ML Hello everybody, we are a group of three students from TU Berlin (I guess we're not the first group creating an issue) and we want to/have to implement a tf-idf tranformer for Flink. Our lecturer Alexander told us that we could get some guidance here and that you could point us to an old version of a similar tranformer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1999) TF-IDF transformer
[ https://issues.apache.org/jira/browse/FLINK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539539#comment-14539539 ] Alexander Alexandrov commented on FLINK-1999: - I'm not sure whether Seq[String] is enough as a type. In particular, I am not seeing how you can then classify the original items, as the transformed `DataSet` does not hold any document ID. I would therefore suggest to have a more generic type {code} case class Point[K, V](id: K, value: V) {} {code} The type of the transformer should then be Point[K, Seq[String]] = Point[K, SparseVector[Double]] where the point value is a SparseVector[Double] encodes the tf-idf values of the words occurring in the document. TF-IDF transformer -- Key: FLINK-1999 URL: https://issues.apache.org/jira/browse/FLINK-1999 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Ronny Bräunlich Assignee: Alexander Alexandrov Priority: Minor Labels: ML Hello everybody, we are a group of three students from TU Berlin (I guess we're not the first group creating an issue) and we want to/have to implement a tf-idf tranformer for Flink. Our lecturer Alexander told us that we could get some guidance here and that you could point us to an old version of a similar tranformer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1959) Accumulators BROKEN after Partitioning
[ https://issues.apache.org/jira/browse/FLINK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Alexandrov updated FLINK-1959: Affects Version/s: (was: 0.8.1) master Fix Version/s: (was: 0.8.1) master Accumulators BROKEN after Partitioning -- Key: FLINK-1959 URL: https://issues.apache.org/jira/browse/FLINK-1959 Project: Flink Issue Type: Bug Components: Examples Affects Versions: master Reporter: mustafa elbehery Priority: Critical Fix For: master while running the Accumulator example in https://github.com/Elbehery/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/EmptyFieldsCountAccumulator.java, I tried to alter the data flow with PartitionByHash function before applying Filter, and the resulted accumulator was NULL. By Debugging, I could see the accumulator in the RunTime Map. However, by retrieving the accumulator from the JobExecutionResult object, it was NULL. The line caused the problem is file.partitionByHash(1).filter(new EmptyFieldFilter()) instead of file.filter(new EmptyFieldFilter()) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1999) TF-IDF transformer
[ https://issues.apache.org/jira/browse/FLINK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537853#comment-14537853 ] Alexander Alexandrov commented on FLINK-1999: - What does a Seq[String] represent? TF-IDF transformer -- Key: FLINK-1999 URL: https://issues.apache.org/jira/browse/FLINK-1999 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Ronny Bräunlich Assignee: Alexander Alexandrov Priority: Minor Labels: ML Hello everybody, we are a group of three students from TU Berlin (I guess we're not the first group creating an issue) and we want to/have to implement a tf-idf tranformer for Flink. Our lecturer Alexander told us that we could get some guidance here and that you could point us to an old version of a similar tranformer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1959) Accumulators BROKEN after Partitioning
[ https://issues.apache.org/jira/browse/FLINK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537845#comment-14537845 ] Alexander Alexandrov commented on FLINK-1959: - I think that some communication / traversal chain gets broken in the ParitionByHash node. You can either (1) try to dig through the code and see where this happens, or (2) use an alternative to the accumulator until the issue is resolved (e.g. write the information to a pre-defined HDFS path); Accumulators BROKEN after Partitioning -- Key: FLINK-1959 URL: https://issues.apache.org/jira/browse/FLINK-1959 Project: Flink Issue Type: Bug Components: Examples Affects Versions: 0.8.1 Reporter: mustafa elbehery Priority: Critical Fix For: 0.8.1 while running the Accumulator example in https://github.com/Elbehery/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/EmptyFieldsCountAccumulator.java, I tried to alter the data flow with PartitionByHash function before applying Filter, and the resulted accumulator was NULL. By Debugging, I could see the accumulator in the RunTime Map. However, by retrieving the accumulator from the JobExecutionResult object, it was NULL. The line caused the problem is file.partitionByHash(1).filter(new EmptyFieldFilter()) instead of file.filter(new EmptyFieldFilter()) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1999) TF-IDF transformer
[ https://issues.apache.org/jira/browse/FLINK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537358#comment-14537358 ] Alexander Alexandrov commented on FLINK-1999: - I agree with [~Felix Neutatz], you can use the PR for the feature hasher as a reference. TF-IDF transformer -- Key: FLINK-1999 URL: https://issues.apache.org/jira/browse/FLINK-1999 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Ronny Bräunlich Assignee: Alexander Alexandrov Priority: Minor Labels: ML Hello everybody, we are a group of three students from TU Berlin (I guess we're not the first group creating an issue) and we want to/have to implement a tf-idf tranformer for Flink. Our lecturer Alexander told us that we could get some guidance here and that you could point us to an old version of a similar tranformer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1848) Paths containing a Windows drive letter cannot be used in FileOutputFormats
[ https://issues.apache.org/jira/browse/FLINK-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509282#comment-14509282 ] Alexander Alexandrov commented on FLINK-1848: - I think this should be fixed (with the PR) and can be closed now, right? Paths containing a Windows drive letter cannot be used in FileOutputFormats --- Key: FLINK-1848 URL: https://issues.apache.org/jira/browse/FLINK-1848 Project: Flink Issue Type: Bug Affects Versions: 0.9 Environment: Windows (Cygwin and native) Reporter: Fabian Hueske Assignee: Fabian Hueske Priority: Critical Fix For: 0.9 Paths that contain a Windows drive letter such as {{file:///c:/my/directory}} cannot be used as output path for {{FileOutputFormat}}. If done, the following exception is thrown: {code} Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:c: at org.apache.flink.core.fs.Path.initialize(Path.java:242) at org.apache.flink.core.fs.Path.init(Path.java:225) at org.apache.flink.core.fs.Path.init(Path.java:138) at org.apache.flink.core.fs.local.LocalFileSystem.pathToFile(LocalFileSystem.java:147) at org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:232) at org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233) at org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233) at org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233) at org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233) at org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileSystem.java:603) at org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:233) at org.apache.flink.api.java.io.CsvOutputFormat.open(CsvOutputFormat.java:158) at org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:183) at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) at java.lang.Thread.run(Unknown Source) Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:c: at java.net.URI.checkPath(Unknown Source) at java.net.URI.init(Unknown Source) at org.apache.flink.core.fs.Path.initialize(Path.java:240) ... 14 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-1735) Add FeatureHasher to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Alexandrov reassigned FLINK-1735: --- Assignee: Alexander Alexandrov Add FeatureHasher to machine learning library - Key: FLINK-1735 URL: https://issues.apache.org/jira/browse/FLINK-1735 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Alexander Alexandrov Labels: ML Using the hashing trick [1,2] is a common way to vectorize arbitrary feature values. The hash of the feature value is used to calculate its index for a vector entry. In order to mitigate possible collisions, a second hashing function is used to calculate the sign for the update value which is added to the vector entry. This way, it is likely that collision will simply cancel out. A feature hasher would also be helpful for NLP problems where it could be used to vectorize bag of words or ngrams feature vectors. Resources: [1] [https://en.wikipedia.org/wiki/Feature_hashing] [2] [http://scikit-learn.org/stable/modules/feature_extraction.html#feature-extraction] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-1743) Add multinomial logistic regression to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Alexandrov reassigned FLINK-1743: --- Assignee: Alexander Alexandrov Add multinomial logistic regression to machine learning library --- Key: FLINK-1743 URL: https://issues.apache.org/jira/browse/FLINK-1743 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Alexander Alexandrov Labels: ML Multinomial logistic regression [1] would be good first classification algorithm which can classify multiple classes. Resources: [1] [http://en.wikipedia.org/wiki/Multinomial_logistic_regression] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1736) Add CountVectorizer to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Alexandrov updated FLINK-1736: Assignee: Alexander Alexandrov Add CountVectorizer to machine learning library --- Key: FLINK-1736 URL: https://issues.apache.org/jira/browse/FLINK-1736 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Alexander Alexandrov Labels: ML A {{CountVectorizer}} feature extractor [1] assigns each occurring word in a corpus an unique identifier. With this mapping it can vectorize models such as bag of words or ngrams in a efficient way. The unique identifier assigned to a word acts as the index of a vector. The number of word occurrences is represented as a vector value at a specific index. The advantage of the {{CountVectorizer}} compared to the FeatureHasher is that the mapping of words to indices can be obtained which makes it easier to understand the resulting feature vectors. The {{CountVectorizer}} could be generalized to support arbitrary feature values. The {{CountVectorizer}} should be implemented as a {{Transfomer}}. Resources: [1] [http://scikit-learn.org/stable/modules/feature_extraction.html#common-vectorizer-usage] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-1731) Add kMeans clustering algorithm to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Alexandrov reassigned FLINK-1731: --- Assignee: Alexander Alexandrov Add kMeans clustering algorithm to machine learning library --- Key: FLINK-1731 URL: https://issues.apache.org/jira/browse/FLINK-1731 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Alexander Alexandrov Labels: ML The Flink repository already contains a kMeans implementation but it is not yet ported to the machine learning library. I assume that only the used data types have to be adapted and then it can be more or less directly moved to flink-ml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1829) Conflicting Jackson version in the Flink POMs
[ https://issues.apache.org/jira/browse/FLINK-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492912#comment-14492912 ] Alexander Alexandrov commented on FLINK-1829: - The client project includes (with provided scope) flink-scala, flink-java, and flink-clients. Here is the dependency tree for the problematic `emma-examples` project: {{{ [INFO] [INFO] Building emma-sketchbook 1.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-dependency-plugin:2.1:tree (default-cli) @ emma-sketchbook --- [INFO] eu.stratosphere:emma-sketchbook:jar:1.0-SNAPSHOT [INFO] +- org.scala-lang:scala-library:jar:2.11.4:compile [INFO] +- org.scala-lang:scala-reflect:jar:2.11.4:compile [INFO] +- org.scala-lang:scala-compiler:jar:2.11.4:compile [INFO] | +- org.scala-lang.modules:scala-xml_2.11:jar:1.0.2:compile [INFO] | \- org.scala-lang.modules:scala-parser-combinators_2.11:jar:1.0.2:compile [INFO] +- eu.stratosphere:emma-language:jar:1.0-SNAPSHOT:compile [INFO] | +- com.assembla.scala-incubator:graph-core_2.11:jar:1.9.0:compile [INFO] | +- eu.stratosphere:emma-backend:jar:1.0-SNAPSHOT:compile [INFO] | +- eu.stratosphere:emma-common:jar:1.0-SNAPSHOT:compile [INFO] | | +- eu.stratosphere:emma-common-macros:jar:1.0-SNAPSHOT:compile [INFO] | | +- net.sf.opencsv:opencsv:jar:2.3:compile [INFO] | | \- com.typesafe.scala-logging:scala-logging-slf4j_2.11:jar:2.1.2:compile [INFO] | | \- com.typesafe.scala-logging:scala-logging-api_2.11:jar:2.1.2:compile [INFO] | +- org.apache.hadoop:hadoop-common:jar:2.2.0:compile [INFO] | | +- org.apache.hadoop:hadoop-annotations:jar:2.2.0:compile [INFO] | | | \- jdk.tools:jdk.tools:jar:1.7:system [INFO] | | +- commons-cli:commons-cli:jar:1.2:compile [INFO] | | +- org.apache.commons:commons-math:jar:2.1:compile [INFO] | | +- xmlenc:xmlenc:jar:0.52:compile [INFO] | | +- commons-httpclient:commons-httpclient:jar:3.1:compile [INFO] | | +- commons-codec:commons-codec:jar:1.4:compile [INFO] | | +- commons-io:commons-io:jar:2.4:compile [INFO] | | +- commons-net:commons-net:jar:2.2:compile [INFO] | | +- javax.servlet:servlet-api:jar:2.5:compile [INFO] | | +- org.mortbay.jetty:jetty:jar:6.1.26:compile [INFO] | | +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile [INFO] | | +- com.sun.jersey:jersey-core:jar:1.9:compile [INFO] | | +- com.sun.jersey:jersey-json:jar:1.9:compile [INFO] | | | +- org.codehaus.jettison:jettison:jar:1.1:compile [INFO] | | | | \- stax:stax-api:jar:1.0.1:compile [INFO] | | | +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile [INFO] | | | | \- javax.xml.bind:jaxb-api:jar:2.2.2:compile [INFO] | | | | \- javax.activation:activation:jar:1.1:compile [INFO] | | | +- org.codehaus.jackson:jackson-jaxrs:jar:1.8.3:compile [INFO] | | | \- org.codehaus.jackson:jackson-xc:jar:1.8.3:compile [INFO] | | +- com.sun.jersey:jersey-server:jar:1.9:compile [INFO] | | | \- asm:asm:jar:3.1:compile [INFO] | | +- tomcat:jasper-compiler:jar:5.5.23:runtime [INFO] | | +- commons-logging:commons-logging:jar:1.1.1:compile [INFO] | | +- net.java.dev.jets3t:jets3t:jar:0.7.1:compile [INFO] | | +- commons-lang:commons-lang:jar:2.5:compile [INFO] | | +- commons-configuration:commons-configuration:jar:1.6:compile [INFO] | | | +- commons-collections:commons-collections:jar:3.2.1:compile [INFO] | | | +- commons-digester:commons-digester:jar:1.8:compile [INFO] | | | | \- commons-beanutils:commons-beanutils:jar:1.7.0:compile [INFO] | | | \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile [INFO] | | +- org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile [INFO] | | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:compile [INFO] | | +- org.apache.avro:avro:jar:1.7.6:compile [INFO] | | | +- com.thoughtworks.paranamer:paranamer:jar:2.3:compile [INFO] | | | \- org.xerial.snappy:snappy-java:jar:1.1.1.6:compile [INFO] | | +- com.google.protobuf:protobuf-java:jar:2.5.0:compile [INFO] | | +- org.apache.hadoop:hadoop-auth:jar:2.2.0:compile [INFO] | | +- com.jcraft:jsch:jar:0.1.42:compile [INFO] | | +- org.apache.zookeeper:zookeeper:jar:3.4.5:compile [INFO] | | \- org.apache.commons:commons-compress:jar:1.4.1:compile [INFO] | | \- org.tukaani:xz:jar:1.0:compile [INFO] | \- org.apache.hadoop:hadoop-hdfs:jar:2.2.0:compile [INFO] | +- commons-daemon:commons-daemon:jar:1.0.13:compile [INFO] | +- javax.servlet.jsp:jsp-api:jar:2.1:compile [INFO] | \- tomcat:jasper-runtime:jar:5.5.23:compile [INFO] |\- commons-el:commons-el:jar:1.0:compile [INFO] +- eu.stratosphere:emma-flink:jar:1.0-SNAPSHOT:compile [INFO] +- eu.stratosphere:emma-common:test-jar:tests:1.0-SNAPSHOT:test [INFO] +- eu.stratosphere:emma-language:test-jar:tests:1.0-SNAPSHOT:test [INFO]
[jira] [Commented] (FLINK-1829) Conflicting Jackson version in the Flink POMs
[ https://issues.apache.org/jira/browse/FLINK-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492965#comment-14492965 ] Alexander Alexandrov commented on FLINK-1829: - Sorry, I thought that the two versions might share the same fqnames even though they differ in the groupId, which is not the case. I checked again, I think I found the error. I actually have Spark 1.2.0 with {{provided}} scope next to the Flink deps. From Spark I get {noformat} [INFO] | +- org.json4s:json4s-jackson_2.11:jar:3.2.10:provided [INFO] | | +- org.json4s:json4s-core_2.11:jar:3.2.10:provided [INFO] | | | +- org.json4s:json4s-ast_2.11:jar:3.2.10:provided [INFO] | | | +- com.thoughtworks.paranamer:paranamer:jar:2.6:provided [INFO] | | | \- org.scala-lang:scalap:jar:2.11.0:provided [INFO] | | \- com.fasterxml.jackson.core:jackson-databind:jar:2.3.1:provided [INFO] | | +- com.fasterxml.jackson.core:jackson-annotations:jar:2.3.0:provided [INFO] | | \- com.fasterxml.jackson.core:jackson-core:jar:2.3.1:provided {noformat} which conflicts with Flink's version of {{com.fasterxml.jackson.core:jackson-*}}. I guess I'll have to fix the issue locally. You can close this with 'nofix', thanks for the hint. Conflicting Jackson version in the Flink POMs - Key: FLINK-1829 URL: https://issues.apache.org/jira/browse/FLINK-1829 Project: Flink Issue Type: Bug Components: Build System Affects Versions: 0.9 Reporter: Alexander Alexandrov Assignee: Robert Metzger Fix For: 0.9 The current POM setup transitively includes multiple conflicting versions of the Jackson library over * {{com.amazonaws:aws-java-sdk}} (v. 2.1.1) * {{org.apache.avro:avro}} (v. 1.9.13) * {{org.apache.hbase:hbase-client}} (v. 1.8.8) When running jobs against a Flink local runtime embedded with: {code:xml} dependency groupIdorg.apache.flink/groupId artifactIdflink-scala/artifactId version${flink.version}/version scopeprovided/scope /dependency dependency groupIdorg.apache.flink/groupId artifactIdflink-java/artifactId version${flink.version}/version scopeprovided/scope /dependency dependency groupIdorg.apache.flink/groupId artifactIdflink-clients/artifactId version${flink.version}/version scopeprovided/scope /dependency {code} I get the following error: {noformat} 15-04-04 15:52:04 ERROR exception during creation akka.actor.ActorInitializationException: exception during creation at akka.actor.ActorInitializationException$.apply(Actor.scala:164) at akka.actor.ActorCell.create(ActorCell.scala:596) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at akka.util.Reflect$.instantiate(Reflect.scala:66) at akka.actor.ArgsReflectConstructor.produce(Props.scala:352) at akka.actor.Props.newActor(Props.scala:252) at akka.actor.ActorCell.newActor(ActorCell.scala:552) at akka.actor.ActorCell.create(ActorCell.scala:578) ... 9 more Caused by: java.lang.NoSuchMethodError: com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z at com.fasterxml.jackson.databind.ObjectMapper.init(ObjectMapper.java:445) at com.fasterxml.jackson.databind.ObjectMapper.init(ObjectMapper.java:366) at org.apache.flink.runtime.taskmanager.TaskManager.init(TaskManager.scala:134) ... 18 more {noformat} Fixing the Jackson version on the client side, e.g, with the following snippet {code:xml} dependency groupIdcom.fasterxml.jackson.core/groupId artifactIdjackson-core/artifactId version2.2.1/version scopeprovided/scope /dependency dependency groupIdcom.fasterxml.jackson.core/groupId
[jira] [Comment Edited] (FLINK-1829) Conflicting Jackson version in the Flink POMs
[ https://issues.apache.org/jira/browse/FLINK-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492912#comment-14492912 ] Alexander Alexandrov edited comment on FLINK-1829 at 4/13/15 7:21 PM: -- The client project includes (with provided scope) flink-scala, flink-java, and flink-clients. Here is the dependency tree for the problematic `emma-examples` project: {noformat} [INFO] [INFO] Building emma-sketchbook 1.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-dependency-plugin:2.1:tree (default-cli) @ emma-sketchbook --- [INFO] eu.stratosphere:emma-sketchbook:jar:1.0-SNAPSHOT [INFO] +- org.scala-lang:scala-library:jar:2.11.4:compile [INFO] +- org.scala-lang:scala-reflect:jar:2.11.4:compile [INFO] +- org.scala-lang:scala-compiler:jar:2.11.4:compile [INFO] | +- org.scala-lang.modules:scala-xml_2.11:jar:1.0.2:compile [INFO] | \- org.scala-lang.modules:scala-parser-combinators_2.11:jar:1.0.2:compile [INFO] +- eu.stratosphere:emma-language:jar:1.0-SNAPSHOT:compile [INFO] | +- com.assembla.scala-incubator:graph-core_2.11:jar:1.9.0:compile [INFO] | +- eu.stratosphere:emma-backend:jar:1.0-SNAPSHOT:compile [INFO] | +- eu.stratosphere:emma-common:jar:1.0-SNAPSHOT:compile [INFO] | | +- eu.stratosphere:emma-common-macros:jar:1.0-SNAPSHOT:compile [INFO] | | +- net.sf.opencsv:opencsv:jar:2.3:compile [INFO] | | \- com.typesafe.scala-logging:scala-logging-slf4j_2.11:jar:2.1.2:compile [INFO] | | \- com.typesafe.scala-logging:scala-logging-api_2.11:jar:2.1.2:compile [INFO] | +- org.apache.hadoop:hadoop-common:jar:2.2.0:compile [INFO] | | +- org.apache.hadoop:hadoop-annotations:jar:2.2.0:compile [INFO] | | | \- jdk.tools:jdk.tools:jar:1.7:system [INFO] | | +- commons-cli:commons-cli:jar:1.2:compile [INFO] | | +- org.apache.commons:commons-math:jar:2.1:compile [INFO] | | +- xmlenc:xmlenc:jar:0.52:compile [INFO] | | +- commons-httpclient:commons-httpclient:jar:3.1:compile [INFO] | | +- commons-codec:commons-codec:jar:1.4:compile [INFO] | | +- commons-io:commons-io:jar:2.4:compile [INFO] | | +- commons-net:commons-net:jar:2.2:compile [INFO] | | +- javax.servlet:servlet-api:jar:2.5:compile [INFO] | | +- org.mortbay.jetty:jetty:jar:6.1.26:compile [INFO] | | +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile [INFO] | | +- com.sun.jersey:jersey-core:jar:1.9:compile [INFO] | | +- com.sun.jersey:jersey-json:jar:1.9:compile [INFO] | | | +- org.codehaus.jettison:jettison:jar:1.1:compile [INFO] | | | | \- stax:stax-api:jar:1.0.1:compile [INFO] | | | +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile [INFO] | | | | \- javax.xml.bind:jaxb-api:jar:2.2.2:compile [INFO] | | | | \- javax.activation:activation:jar:1.1:compile [INFO] | | | +- org.codehaus.jackson:jackson-jaxrs:jar:1.8.3:compile [INFO] | | | \- org.codehaus.jackson:jackson-xc:jar:1.8.3:compile [INFO] | | +- com.sun.jersey:jersey-server:jar:1.9:compile [INFO] | | | \- asm:asm:jar:3.1:compile [INFO] | | +- tomcat:jasper-compiler:jar:5.5.23:runtime [INFO] | | +- commons-logging:commons-logging:jar:1.1.1:compile [INFO] | | +- net.java.dev.jets3t:jets3t:jar:0.7.1:compile [INFO] | | +- commons-lang:commons-lang:jar:2.5:compile [INFO] | | +- commons-configuration:commons-configuration:jar:1.6:compile [INFO] | | | +- commons-collections:commons-collections:jar:3.2.1:compile [INFO] | | | +- commons-digester:commons-digester:jar:1.8:compile [INFO] | | | | \- commons-beanutils:commons-beanutils:jar:1.7.0:compile [INFO] | | | \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile [INFO] | | +- org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile [INFO] | | +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:compile [INFO] | | +- org.apache.avro:avro:jar:1.7.6:compile [INFO] | | | +- com.thoughtworks.paranamer:paranamer:jar:2.3:compile [INFO] | | | \- org.xerial.snappy:snappy-java:jar:1.1.1.6:compile [INFO] | | +- com.google.protobuf:protobuf-java:jar:2.5.0:compile [INFO] | | +- org.apache.hadoop:hadoop-auth:jar:2.2.0:compile [INFO] | | +- com.jcraft:jsch:jar:0.1.42:compile [INFO] | | +- org.apache.zookeeper:zookeeper:jar:3.4.5:compile [INFO] | | \- org.apache.commons:commons-compress:jar:1.4.1:compile [INFO] | | \- org.tukaani:xz:jar:1.0:compile [INFO] | \- org.apache.hadoop:hadoop-hdfs:jar:2.2.0:compile [INFO] | +- commons-daemon:commons-daemon:jar:1.0.13:compile [INFO] | +- javax.servlet.jsp:jsp-api:jar:2.1:compile [INFO] | \- tomcat:jasper-runtime:jar:5.5.23:compile [INFO] |\- commons-el:commons-el:jar:1.0:compile [INFO] +- eu.stratosphere:emma-flink:jar:1.0-SNAPSHOT:compile [INFO] +- eu.stratosphere:emma-common:test-jar:tests:1.0-SNAPSHOT:test [INFO] +-
[jira] [Comment Edited] (FLINK-1829) Conflicting Jackson version in the Flink POMs
[ https://issues.apache.org/jira/browse/FLINK-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492932#comment-14492932 ] Alexander Alexandrov edited comment on FLINK-1829 at 4/13/15 7:33 PM: -- Correct, HBase is out of scope, but we still get both jackson 2.1.1 and 1.9.13 through the {{com.amazonaws:aws-java-sdk}} and {{org.apache.avro:avro}} dependencies respectively. was (Author: aalexandrov): Correct, HBase is out of scope, but we still get both jackson 2.1.1 and 1.9.13 through the {com.amazonaws:aws-java-sdk} and {org.apache.avro:avro} dependencies respectively. Conflicting Jackson version in the Flink POMs - Key: FLINK-1829 URL: https://issues.apache.org/jira/browse/FLINK-1829 Project: Flink Issue Type: Bug Components: Build System Affects Versions: 0.9 Reporter: Alexander Alexandrov Assignee: Robert Metzger Fix For: 0.9 The current POM setup transitively includes multiple conflicting versions of the Jackson library over * {{com.amazonaws:aws-java-sdk}} (v. 2.1.1) * {{org.apache.avro:avro}} (v. 1.9.13) * {{org.apache.hbase:hbase-client}} (v. 1.8.8) When running jobs against a Flink local runtime embedded with: {code:xml} dependency groupIdorg.apache.flink/groupId artifactIdflink-scala/artifactId version${flink.version}/version scopeprovided/scope /dependency dependency groupIdorg.apache.flink/groupId artifactIdflink-java/artifactId version${flink.version}/version scopeprovided/scope /dependency dependency groupIdorg.apache.flink/groupId artifactIdflink-clients/artifactId version${flink.version}/version scopeprovided/scope /dependency {code} I get the following error: {noformat} 15-04-04 15:52:04 ERROR exception during creation akka.actor.ActorInitializationException: exception during creation at akka.actor.ActorInitializationException$.apply(Actor.scala:164) at akka.actor.ActorCell.create(ActorCell.scala:596) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at akka.util.Reflect$.instantiate(Reflect.scala:66) at akka.actor.ArgsReflectConstructor.produce(Props.scala:352) at akka.actor.Props.newActor(Props.scala:252) at akka.actor.ActorCell.newActor(ActorCell.scala:552) at akka.actor.ActorCell.create(ActorCell.scala:578) ... 9 more Caused by: java.lang.NoSuchMethodError: com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z at com.fasterxml.jackson.databind.ObjectMapper.init(ObjectMapper.java:445) at com.fasterxml.jackson.databind.ObjectMapper.init(ObjectMapper.java:366) at org.apache.flink.runtime.taskmanager.TaskManager.init(TaskManager.scala:134) ... 18 more {noformat} Fixing the Jackson version on the client side, e.g, with the following snippet {code:xml} dependency groupIdcom.fasterxml.jackson.core/groupId artifactIdjackson-core/artifactId version2.2.1/version scopeprovided/scope /dependency dependency groupIdcom.fasterxml.jackson.core/groupId artifactIdjackson-databind/artifactId version2.2.1/version scopeprovided/scope /dependency dependency groupIdcom.fasterxml.jackson.core/groupId artifactIdjackson-annotations/artifactId version2.2.1/version scopeprovided/scope /dependency {code} solves the problem, but I guess it will be better if we can stick with one version in the build artifacts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1829) Conflicting Jackson version in the Flink POMs
[ https://issues.apache.org/jira/browse/FLINK-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492938#comment-14492938 ] Alexander Alexandrov commented on FLINK-1829: - I am actually running from IntelliJ using a TestCase that starts the job. I'll post the classpath with and without the {{jackson}} enforcement on the client POM in a minute. Conflicting Jackson version in the Flink POMs - Key: FLINK-1829 URL: https://issues.apache.org/jira/browse/FLINK-1829 Project: Flink Issue Type: Bug Components: Build System Affects Versions: 0.9 Reporter: Alexander Alexandrov Assignee: Robert Metzger Fix For: 0.9 The current POM setup transitively includes multiple conflicting versions of the Jackson library over * {{com.amazonaws:aws-java-sdk}} (v. 2.1.1) * {{org.apache.avro:avro}} (v. 1.9.13) * {{org.apache.hbase:hbase-client}} (v. 1.8.8) When running jobs against a Flink local runtime embedded with: {code:xml} dependency groupIdorg.apache.flink/groupId artifactIdflink-scala/artifactId version${flink.version}/version scopeprovided/scope /dependency dependency groupIdorg.apache.flink/groupId artifactIdflink-java/artifactId version${flink.version}/version scopeprovided/scope /dependency dependency groupIdorg.apache.flink/groupId artifactIdflink-clients/artifactId version${flink.version}/version scopeprovided/scope /dependency {code} I get the following error: {noformat} 15-04-04 15:52:04 ERROR exception during creation akka.actor.ActorInitializationException: exception during creation at akka.actor.ActorInitializationException$.apply(Actor.scala:164) at akka.actor.ActorCell.create(ActorCell.scala:596) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at akka.util.Reflect$.instantiate(Reflect.scala:66) at akka.actor.ArgsReflectConstructor.produce(Props.scala:352) at akka.actor.Props.newActor(Props.scala:252) at akka.actor.ActorCell.newActor(ActorCell.scala:552) at akka.actor.ActorCell.create(ActorCell.scala:578) ... 9 more Caused by: java.lang.NoSuchMethodError: com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z at com.fasterxml.jackson.databind.ObjectMapper.init(ObjectMapper.java:445) at com.fasterxml.jackson.databind.ObjectMapper.init(ObjectMapper.java:366) at org.apache.flink.runtime.taskmanager.TaskManager.init(TaskManager.scala:134) ... 18 more {noformat} Fixing the Jackson version on the client side, e.g, with the following snippet {code:xml} dependency groupIdcom.fasterxml.jackson.core/groupId artifactIdjackson-core/artifactId version2.2.1/version scopeprovided/scope /dependency dependency groupIdcom.fasterxml.jackson.core/groupId artifactIdjackson-databind/artifactId version2.2.1/version scopeprovided/scope /dependency dependency groupIdcom.fasterxml.jackson.core/groupId artifactIdjackson-annotations/artifactId version2.2.1/version scopeprovided/scope /dependency {code} solves the problem, but I guess it will be better if we can stick with one version in the build artifacts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1829) Conflicting Jackson version in the Flink POMs
[ https://issues.apache.org/jira/browse/FLINK-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492932#comment-14492932 ] Alexander Alexandrov commented on FLINK-1829: - Correct, HBase is out of scope, but we still get both jackson 2.1.1 and 1.9.13 through the {com.amazonaws:aws-java-sdk} and {org.apache.avro:avro} dependencies respectively. Conflicting Jackson version in the Flink POMs - Key: FLINK-1829 URL: https://issues.apache.org/jira/browse/FLINK-1829 Project: Flink Issue Type: Bug Components: Build System Affects Versions: 0.9 Reporter: Alexander Alexandrov Assignee: Robert Metzger Fix For: 0.9 The current POM setup transitively includes multiple conflicting versions of the Jackson library over * {{com.amazonaws:aws-java-sdk}} (v. 2.1.1) * {{org.apache.avro:avro}} (v. 1.9.13) * {{org.apache.hbase:hbase-client}} (v. 1.8.8) When running jobs against a Flink local runtime embedded with: {code:xml} dependency groupIdorg.apache.flink/groupId artifactIdflink-scala/artifactId version${flink.version}/version scopeprovided/scope /dependency dependency groupIdorg.apache.flink/groupId artifactIdflink-java/artifactId version${flink.version}/version scopeprovided/scope /dependency dependency groupIdorg.apache.flink/groupId artifactIdflink-clients/artifactId version${flink.version}/version scopeprovided/scope /dependency {code} I get the following error: {noformat} 15-04-04 15:52:04 ERROR exception during creation akka.actor.ActorInitializationException: exception during creation at akka.actor.ActorInitializationException$.apply(Actor.scala:164) at akka.actor.ActorCell.create(ActorCell.scala:596) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at akka.util.Reflect$.instantiate(Reflect.scala:66) at akka.actor.ArgsReflectConstructor.produce(Props.scala:352) at akka.actor.Props.newActor(Props.scala:252) at akka.actor.ActorCell.newActor(ActorCell.scala:552) at akka.actor.ActorCell.create(ActorCell.scala:578) ... 9 more Caused by: java.lang.NoSuchMethodError: com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z at com.fasterxml.jackson.databind.ObjectMapper.init(ObjectMapper.java:445) at com.fasterxml.jackson.databind.ObjectMapper.init(ObjectMapper.java:366) at org.apache.flink.runtime.taskmanager.TaskManager.init(TaskManager.scala:134) ... 18 more {noformat} Fixing the Jackson version on the client side, e.g, with the following snippet {code:xml} dependency groupIdcom.fasterxml.jackson.core/groupId artifactIdjackson-core/artifactId version2.2.1/version scopeprovided/scope /dependency dependency groupIdcom.fasterxml.jackson.core/groupId artifactIdjackson-databind/artifactId version2.2.1/version scopeprovided/scope /dependency dependency groupIdcom.fasterxml.jackson.core/groupId artifactIdjackson-annotations/artifactId version2.2.1/version scopeprovided/scope /dependency {code} solves the problem, but I guess it will be better if we can stick with one version in the build artifacts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1829) Conflicting Jackson version in the Flink POMs
Alexander Alexandrov created FLINK-1829: --- Summary: Conflicting Jackson version in the Flink POMs Key: FLINK-1829 URL: https://issues.apache.org/jira/browse/FLINK-1829 Project: Flink Issue Type: Bug Components: Build System Affects Versions: 0.9 Reporter: Alexander Alexandrov Fix For: 0.9 The current POM setup transitively includes multiple conflicting versions of the Jackson library over * {{com.amazonaws:aws-java-sdk}} (v. 2.1.1) * {{org.apache.avro:avro}} (v. 1.9.13) * {{org.apache.hbase:hbase-client}} (v. 1.8.8) When running jobs against a Flink local runtime embedded with: {code:xml} dependency groupIdorg.apache.flink/groupId artifactIdflink-scala/artifactId version${flink.version}/version scopeprovided/scope /dependency dependency groupIdorg.apache.flink/groupId artifactIdflink-java/artifactId version${flink.version}/version scopeprovided/scope /dependency dependency groupIdorg.apache.flink/groupId artifactIdflink-clients/artifactId version${flink.version}/version scopeprovided/scope /dependency {code} I get the following error: {noformat} 15-04-04 15:52:04 ERROR exception during creation akka.actor.ActorInitializationException: exception during creation at akka.actor.ActorInitializationException$.apply(Actor.scala:164) at akka.actor.ActorCell.create(ActorCell.scala:596) at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456) at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478) at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at akka.util.Reflect$.instantiate(Reflect.scala:66) at akka.actor.ArgsReflectConstructor.produce(Props.scala:352) at akka.actor.Props.newActor(Props.scala:252) at akka.actor.ActorCell.newActor(ActorCell.scala:552) at akka.actor.ActorCell.create(ActorCell.scala:578) ... 9 more Caused by: java.lang.NoSuchMethodError: com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z at com.fasterxml.jackson.databind.ObjectMapper.init(ObjectMapper.java:445) at com.fasterxml.jackson.databind.ObjectMapper.init(ObjectMapper.java:366) at org.apache.flink.runtime.taskmanager.TaskManager.init(TaskManager.scala:134) ... 18 more {noformat} Fixing the Jackson version on the client side, e.g, with the following snippet {code:xml} dependency groupIdcom.fasterxml.jackson.core/groupId artifactIdjackson-core/artifactId version2.2.1/version scopeprovided/scope /dependency dependency groupIdcom.fasterxml.jackson.core/groupId artifactIdjackson-databind/artifactId version2.2.1/version scopeprovided/scope /dependency dependency groupIdcom.fasterxml.jackson.core/groupId artifactIdjackson-annotations/artifactId version2.2.1/version scopeprovided/scope /dependency {code} solves the problem, but I guess it will be better if we can stick with one version in the build artifacts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-967) Make intermediate results a first-class citizen in the JobGraph
[ https://issues.apache.org/jira/browse/FLINK-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392980#comment-14392980 ] Alexander Alexandrov commented on FLINK-967: [~StephanEwen] what is the status on this? Make intermediate results a first-class citizen in the JobGraph --- Key: FLINK-967 URL: https://issues.apache.org/jira/browse/FLINK-967 Project: Flink Issue Type: New Feature Components: JobManager, TaskManager Affects Versions: 0.6-incubating Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.9 In order to add incremental plan rollout to the system, we need to make intermediate results a first-class citizen in the job graph and scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-1789) Allow adding of URLs to the usercode class loader
[ https://issues.apache.org/jira/browse/FLINK-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390800#comment-14390800 ] Alexander Alexandrov edited comment on FLINK-1789 at 4/1/15 3:29 PM: - The problem with that approach is that the code loaded by the regular JVM class loader cannot refer to job specific types (which can be accessed only at the UserCodeClassLoader level). Unfortunately, this is the case if we use the classpath entry to generate the dataflows dynamically at runtime. My original hack was to therefore hardcode a filesystem path next to the list of jars when initializing the BlobManager entry, and I wanted to open an issue which makes this list parameterizable via an additional ExecutionEnvironment argument (this is basically the only main feature which prohibits the use of Emma with off-the-shelf Flink). This, of course, would require that the folders are shared (e.g. via NFS) between client, master and workers. I think what made Stephan so excited is the idea of using the same URL mechanism in order to ship the code to all dependent parties (most probably by running a dedicated HTTP or FTP server on the client). was (Author: aalexandrov): The problem with that approach is that the code loaded by the regular JVM class loader cannot refer to job specific types (which can be accessed only at the UserCodeClassLoader level). Unfortunately, this is the case if we use the classpath entry to generate the dataflows dynamically at runtime. My original hack was to therefore hardcode a filesystem path next to the list of jars when initializing the BlobManager, and I wanted to open an issue which makes this configurable when initializing the execution environment (this is basically the only main feature which prohibits the use of Emma with off-the-shelf Flink). This, of course, would require that the folders are shared (e.g. via NFS) between client, master and workers. I think what made Stephan so excited is the idea of using the same URL mechanism in order to ship the code to all dependent parties (most probably by running a dedicated HTTP or FTP server on the client). Allow adding of URLs to the usercode class loader - Key: FLINK-1789 URL: https://issues.apache.org/jira/browse/FLINK-1789 Project: Flink Issue Type: Improvement Components: Distributed Runtime Reporter: Timo Walther Assignee: Timo Walther Priority: Minor Currently, there is no option to add customs classpath URLs to the FlinkUserCodeClassLoader. JARs always need to be shipped to the cluster even if they are already present on all nodes. It would be great if RemoteEnvironment also accepts valid classpaths URLs and forwards them to BlobLibraryCacheManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1741) Add Jaccard Similarity Metric Example
[ https://issues.apache.org/jira/browse/FLINK-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367867#comment-14367867 ] Alexander Alexandrov commented on FLINK-1741: - I think we had some similar code for this in one of the ETL jobs in the IMPRO repository. Add Jaccard Similarity Metric Example - Key: FLINK-1741 URL: https://issues.apache.org/jira/browse/FLINK-1741 Project: Flink Issue Type: Task Components: Gelly Affects Versions: 0.9 Reporter: Andra Lungu Assignee: Andra Lungu http://www.inside-r.org/packages/cran/igraph/docs/similarity -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1713) Add support for blocking data exchange in closed loop iterations
[ https://issues.apache.org/jira/browse/FLINK-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366205#comment-14366205 ] Alexander Alexandrov commented on FLINK-1713: - I think he means Flink's native iteration support, i.e. execution plans which are wrapped in an IterationHead and IterationTail operators. Add support for blocking data exchange in closed loop iterations Key: FLINK-1713 URL: https://issues.apache.org/jira/browse/FLINK-1713 Project: Flink Issue Type: Improvement Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Priority: Minor The way that blocking intermediate results are currently managed prohibits them from being used inside of closed loops. A blocking result has to be fully produced before its receivers are deployed and there is no notion of single iterations etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1613) Cannost submit to remote ExecutionEnvironment from IDE
Alexander Alexandrov created FLINK-1613: --- Summary: Cannost submit to remote ExecutionEnvironment from IDE Key: FLINK-1613 URL: https://issues.apache.org/jira/browse/FLINK-1613 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: 0.8.1 Environment: * Ubuntu Linux 14.04 * Flink 0.9-SNAPSHOT or 0.8.1 running in standalone mode on localhost Reporter: Alexander Alexandrov Fix For: 0.9, 0.8.2 I am reporting this as [~rmetzler] mentioned offline that it was working in the past. At the moment it is not possible to submit jobs directly from the IDE. Both the Java and the Scala quickstart guides fail on both 0.8.1 and 0.9-SNAPSHOT with ClassNotFoundException exceptions. To reproduce the error, run the quickstart scripts and change the ExecutionEnvironment initialization: {code:java} env = ExecutionEnvironment.createRemoteEnvironment(localhost, 6123) {code} This is the cause for Java: {noformat} Caused by: java.lang.ClassNotFoundException: org.myorg.quickstart.WordCount$LineSplitter at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:274) at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:54) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:274) at org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:236) at org.apache.flink.runtime.operators.util.TaskConfig.getStubWrapper(TaskConfig.java:281) {noformat} This is for Scala: {noformat} java.lang.ClassNotFoundException: org.myorg.quickstart.WordCount$$anon$2$$anon$1 at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:274) at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:54) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:274) at org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:236) at org.apache.flink.api.java.typeutils.runtime.RuntimeSerializerFactory.readParametersFromConfig(RuntimeSerializerFactory.java:76) at org.apache.flink.runtime.operators.util.TaskConfig.getTypeSerializerFactory(TaskConfig.java:1084) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1464) Added ResultTypeQueryable interface to TypeSerializerInputFormat.
Alexander Alexandrov created FLINK-1464: --- Summary: Added ResultTypeQueryable interface to TypeSerializerInputFormat. Key: FLINK-1464 URL: https://issues.apache.org/jira/browse/FLINK-1464 Project: Flink Issue Type: Improvement Components: Distributed Runtime, Optimizer Affects Versions: 0.8, 0.9, 0.8.1 Reporter: Alexander Alexandrov Assignee: Alexander Alexandrov Priority: Minor Fix For: 0.9, 0.8.1 It is currently impossible to use the {{TypeSerializerInputFormat}} with generic Tuple types. For example, [this example gist|https://gist.github.com/aalexandrov/90bf21f66bf604676f37] fails with a {quote} Exception in thread main org.apache.flink.api.common.InvalidProgramException: The type returned by the input format could not be automatically determined. Please specify the TypeInformation of the produced type explicitly. at org.apache.flink.api.java.ExecutionEnvironment.readFile(ExecutionEnvironment.java:341) at SerializedFormatExample$.main(SerializedFormatExample.scala:48) at SerializedFormatExample.main(SerializedFormatExample.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) {quote} exaception. To fix the issue, I changed the constructor to take a {{TypeInformationT}} instad of a {{TypeSerializerT}} argument. If this is indeed a bug, I think that this is a good solution. Unfortunately the fix breaks the API. Feel free to change it if you find a more elegant solution compatible with the 0.8 branch. The suggested fix can be found in the GitHub [PR#349|https://github.com/apache/flink/pull/349]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1422) Missing usage example for withParameters
[ https://issues.apache.org/jira/browse/FLINK-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296986#comment-14296986 ] Alexander Alexandrov commented on FLINK-1422: - +1 from me as well. I recently find out that one can also use constructor arguments to pass UDF parameters from the driver if the UDF is defined as a static final class (i.e. the UDF instance can be serialized). I think that this should be covered here as well for the sake of completeness. Missing usage example for withParameters -- Key: FLINK-1422 URL: https://issues.apache.org/jira/browse/FLINK-1422 Project: Flink Issue Type: Improvement Components: Documentation Affects Versions: 0.8 Reporter: Alexander Alexandrov Assignee: Chesnay Schepler Priority: Trivial Fix For: 0.8.1 Original Estimate: 1h Remaining Estimate: 1h I am struggling to find a usage example of the withParameters method in the documentation. At the moment I only see this note: {quote} Note: As the content of broadcast variables is kept in-memory on each node, it should not become too large. For simpler things like scalar values you can simply make parameters part of the closure of a function, or use the withParameters(...) method to pass in a configuration. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1427) Configuration through environment variables
[ https://issues.apache.org/jira/browse/FLINK-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285744#comment-14285744 ] Alexander Alexandrov commented on FLINK-1427: - I am not too fond of this idea. I like having one clean entry point to the configuration of the system and a clear section documenting how I can change that configuration. One of the main issues I always have when dealing with Spark / Hadoop is deciding where and how to configure what. We actually had an environment based approach in the project in past, but removed it in order to simplify the configuration options and have them all in one place. I suggest to think again about this one, as it will also require at least (1) making sure that the documentation explains that there are multiple equivalent setup options; (2) ensure that these setup options are actually equivalent. Configuration through environment variables --- Key: FLINK-1427 URL: https://issues.apache.org/jira/browse/FLINK-1427 Project: Flink Issue Type: Improvement Components: Local Runtime Environment: Deployment Reporter: Max Michels Priority: Minor Labels: configuration, deployment Like Hadoop or Spark, etc. Flink should support configuration via shell environment variables. In cluster setups, this makes things a lot easier because writing config files can be omitted. Many automation tools (e.g. Google's bdutil) use (or abuse) this feature. For example, to set up the task manager heap size, we would run `export FLINK_TASKMANAGER_HEAP=4096` before starting the task manager on a node to set the heap memory size to 4096MB. Environment variables should overwrite the regular config entries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1426) JobManager AJAX requests sometimes fail
[ https://issues.apache.org/jira/browse/FLINK-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285683#comment-14285683 ] Alexander Alexandrov commented on FLINK-1426: - He is, he has just not opened a JIRA issue about that yet. A preview should be pushed to his repo in the next days: https://github.com/MatzeDS/incubator-flink We will announce it when it's done. Regards, Alexander JobManager AJAX requests sometimes fail --- Key: FLINK-1426 URL: https://issues.apache.org/jira/browse/FLINK-1426 Project: Flink Issue Type: Bug Components: JobManager, Webfrontend Reporter: Robert Metzger It seems that the JobManager sometimes (I think when accessing it the first time) does not show the number of TMs / slots. A simple workaround is re-loading it, but still, users are complaining about it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1426) JobManager AJAX requests sometimes fail
[ https://issues.apache.org/jira/browse/FLINK-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285657#comment-14285657 ] Alexander Alexandrov commented on FLINK-1426: - Just to restate - I have a student who is rewriting this as part of his bachelor thesis. The goal here is to make a PR for a unified version of the JobManager and the SubmissionTool in the next 4-6 weeks. We should avoid investing time solving bugs in code that might change in the near future. I will ping him and let him introduce yourself here. JobManager AJAX requests sometimes fail --- Key: FLINK-1426 URL: https://issues.apache.org/jira/browse/FLINK-1426 Project: Flink Issue Type: Bug Components: JobManager, Webfrontend Reporter: Robert Metzger It seems that the JobManager sometimes (I think when accessing it the first time) does not show the number of TMs / slots. A simple workaround is re-loading it, but still, users are complaining about it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1422) Missing usage example for withParameters
Alexander Alexandrov created FLINK-1422: --- Summary: Missing usage example for withParameters Key: FLINK-1422 URL: https://issues.apache.org/jira/browse/FLINK-1422 Project: Flink Issue Type: Improvement Components: Documentation Affects Versions: 0.8 Reporter: Alexander Alexandrov Priority: Trivial Fix For: 0.8.1 I am struggling to find a usage example of the withParameters method in the documentation. At the moment I only see this note: {quote} Note: As the content of broadcast variables is kept in-memory on each node, it should not become too large. For simpler things like scalar values you can simply make parameters part of the closure of a function, or use the withParameters(...) method to pass in a configuration. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)