[jira] [Comment Edited] (FLINK-20043) Add flink-sql-connector-kinesis package

2020-11-12 Thread Alexander Alexandrov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230725#comment-17230725
 ] 

Alexander Alexandrov edited comment on FLINK-20043 at 11/12/20, 4:14 PM:
-

I can verify that the following dependencies are required and not including 
them will result in compile-time errors: 

* 
[jackson.*|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/AWSUtil.java#L46-L50]
* 
[guava|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java#L41-L44]

I am struggling to identify a code path that would require {{joda-time}}, 
though. I cannot even figure out which {{pom.xml}} declares the dependency on 
{{joda-time}}.

 


was (Author: aalexandrov):
I can verify that the following dependencies are required and not including 
them will result in compile-time errors: 

* 
[jackson.*|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/AWSUtil.java#L46-L50].
* 
[guava|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java#L41-L44]

I am struggling to identify a code path that would require {{joda-time}}, 
though. I cannot even figure out which {{pom.xml}} declares the dependency on 
{{joda-time}}.

 

> Add flink-sql-connector-kinesis package
> ---
>
> Key: FLINK-20043
> URL: https://issues.apache.org/jira/browse/FLINK-20043
> Project: Flink
>  Issue Type: Improvement
>Reporter: Alexander Alexandrov
>Assignee: Alexander Alexandrov
>Priority: Major
>  Labels: pull-request-available
> Attachments: third-party-report.html
>
>
> This is a follow up item after the recent addition of [Kinesis SQL source and 
> sink in PR #13770|https://github.com/apache/flink/pull/13770].
> Create a package that bundles a fat connector jar that can be used by SQL 
> clients. See FLINK-11026 and the related PRs for a discussion how to do that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-20043) Add flink-sql-connector-kinesis package

2020-11-12 Thread Alexander Alexandrov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230725#comment-17230725
 ] 

Alexander Alexandrov edited comment on FLINK-20043 at 11/12/20, 4:12 PM:
-

I can verify that the following dependencies are required and not including 
them will result in compile-time errors: 

* 
[jackson.*|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/AWSUtil.java#L46-L50].
* 
[guava|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java#L41-L44]

I am struggling to identify a code path that would require {{joda-time}}, 
though. I cannot even figure out which {{pom.xml}} declares the dependency on 
{{joda-time}}.

 


was (Author: aalexandrov):
I can verify that [Jackson is required at 
runtime|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/AWSUtil.java#L46-L50|https://github.com/apache/flink/blob/bbcd0c791371c2c6b3e477a83adfbd78dbee2602/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/AWSUtil.java#L46-L50]
 and we will hit errors if we don't include it.

I am struggling to identify a code path that would require {{joda-time}}, 
though. I cannot even figure out which {{pom.xml}} declares the dependency on 
{{joda-time}}. 

 

> Add flink-sql-connector-kinesis package
> ---
>
> Key: FLINK-20043
> URL: https://issues.apache.org/jira/browse/FLINK-20043
> Project: Flink
>  Issue Type: Improvement
>Reporter: Alexander Alexandrov
>Assignee: Alexander Alexandrov
>Priority: Major
>  Labels: pull-request-available
> Attachments: third-party-report.html
>
>
> This is a follow up item after the recent addition of [Kinesis SQL source and 
> sink in PR #13770|https://github.com/apache/flink/pull/13770].
> Create a package that bundles a fat connector jar that can be used by SQL 
> clients. See FLINK-11026 and the related PRs for a discussion how to do that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20043) Add flink-sql-connector-kinesis package

2020-11-12 Thread Alexander Alexandrov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230725#comment-17230725
 ] 

Alexander Alexandrov commented on FLINK-20043:
--

I can verify that [Jackson is required at 
runtime|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/AWSUtil.java#L46-L50|https://github.com/apache/flink/blob/bbcd0c791371c2c6b3e477a83adfbd78dbee2602/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/AWSUtil.java#L46-L50]
 and we will hit errors if we don't include it.

I am struggling to identify a code path that would require {{joda-time}}, 
though. I cannot even figure out which {{pom.xml}} declares the dependency on 
{{joda-time}}. 

 

> Add flink-sql-connector-kinesis package
> ---
>
> Key: FLINK-20043
> URL: https://issues.apache.org/jira/browse/FLINK-20043
> Project: Flink
>  Issue Type: Improvement
>Reporter: Alexander Alexandrov
>Assignee: Alexander Alexandrov
>Priority: Major
>  Labels: pull-request-available
> Attachments: third-party-report.html
>
>
> This is a follow up item after the recent addition of [Kinesis SQL source and 
> sink in PR #13770|https://github.com/apache/flink/pull/13770].
> Create a package that bundles a fat connector jar that can be used by SQL 
> clients. See FLINK-11026 and the related PRs for a discussion how to do that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20043) Add flink-sql-connector-kinesis package

2020-11-09 Thread Alexander Alexandrov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228568#comment-17228568
 ] 

Alexander Alexandrov commented on FLINK-20043:
--

Not really, even for a simple Kinesis source → Print sink dataflow such as


{code:sql}
 CREATE TABLE `mm-10139-source` (
  `event_time` TIMESTAMP(3) NOT NULL,
  `name` VARCHAR(32) NOT NULL,
  `age` BIGINT NOT NULL,
  `office` VARCHAR(255) NOT NULL,
  `role` VARCHAR(4) NOT NULL,
  `arrival_time` TIMESTAMP(3) METADATA FROM 'timestamp' VIRTUAL,
  `shard_id` VARCHAR(128) NOT NULL METADATA FROM 'shard-id' VIRTUAL,
  WATERMARK FOR `event_time` AS `event_time` - INTERVAL '5' second
) PARTITIONED BY (office, role) WITH (
  'connector' = 'kinesis',
  'stream' = 'mm-10139-source',
  'aws.region' = 'us-east-2',
  'scan.stream.initpos' = 'LATEST',
  'sink.partitioner-field-delimiter' = ';',
  'sink.producer.collection-max-count' = '100',
  'format' = 'json',
  'json.timestamp-format.standard' = 'ISO-8601'
);

CREATE TABLE `print-source` (
  `event_time` TIMESTAMP(3),
  `name` VARCHAR(32),
  `age` BIGINT,
  `office` VARCHAR(255),
  `role` VARCHAR(4)
) WITH (
  'connector' = 'print',
  'print-identifier' = 'source'
);

INSERT INTO `print-source`
SELECT
  `event_time`,
  `name`,
  `age`,
  `office`,
  `role`
FROM
  `mm-10139-source`;{code}

I get the following error

{code}
org.apache.flink.runtime.JobException: Recovery is suppressed by 
NoRestartBackoffTimeStrategy
at 
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116)
at 
org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78)
at 
org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:224)
at 
org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:217)
at 
org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:208)
at 
org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:534)
at 
org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:89)
at 
org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:419)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:286)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:201)
at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
at akka.actor.ActorCell.invoke(ActorCell.scala:561)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
at akka.dispatch.Mailbox.run(Mailbox.scala:225)
at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
at 
org.apache.flink.kinesis.shaded.com.amazonaws.ClientConfiguration.(ClientConfiguration.java:47)
at 
org.apache.flink.kinesis.shaded.com.amazonaws.ClientConfigurationFactory.getDefaultConfig(ClientConfigurationFactory.java:46)
at 
org.apache.flink.kinesis.shaded.com.amazonaws.ClientConfigurationFactory.getConfig(ClientConfigurationFactory.java:36)
   

[jira] [Updated] (FLINK-20043) Add flink-sql-connector-kinesis package

2020-11-09 Thread Alexander Alexandrov (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Alexandrov updated FLINK-20043:
-
Description: 
This is a follow up item after the recent addition of [Kinesis SQL source and 
sink in PR #13770|https://github.com/apache/flink/pull/13770].

Create a package that bundles a fat connector jar that can be used by SQL 
clients. See FLINK-11026 and the related PRs for a discussion how to do that.

  was:
This is a follow up item after the recent addition of [Kinesis SQL source and 
sink in PR #13770 |https://github.com/apache/flink/pull/13770].

Create a package that bundles a fat connector jar that can be used by SQL 
clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] 
and the related PRs for a discussion how to do that.


> Add flink-sql-connector-kinesis package
> ---
>
> Key: FLINK-20043
> URL: https://issues.apache.org/jira/browse/FLINK-20043
> Project: Flink
>  Issue Type: Improvement
>Reporter: Alexander Alexandrov
>Assignee: Alexander Alexandrov
>Priority: Major
> Attachments: third-party-report.html
>
>
> This is a follow up item after the recent addition of [Kinesis SQL source and 
> sink in PR #13770|https://github.com/apache/flink/pull/13770].
> Create a package that bundles a fat connector jar that can be used by SQL 
> clients. See FLINK-11026 and the related PRs for a discussion how to do that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20043) Add flink-sql-connector-kinesis package

2020-11-09 Thread Alexander Alexandrov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228528#comment-17228528
 ] 

Alexander Alexandrov commented on FLINK-20043:
--

This is from the bundled {{NOTICE}}

{quote}
This project bundles the following dependencies under the Apache Software 
License 2.0. (http://www.apache.org/licenses/LICENSE-2.0.txt)

- com.amazonaws:amazon-kinesis-client:1.11.2
- com.amazonaws:amazon-kinesis-producer:0.14.0
- com.amazonaws:aws-java-sdk-core:1.11.754
- com.amazonaws:aws-java-sdk-dynamodb:1.11.603
- com.amazonaws:aws-java-sdk-kinesis:1.11.754
- com.amazonaws:aws-java-sdk-kms:1.11.603
- com.amazonaws:aws-java-sdk-s3:1.11.603
- com.amazonaws:aws-java-sdk-sts:1.11.754
- com.amazonaws:dynamodb-streams-kinesis-adapter:1.5.0
- com.amazonaws:jmespath-java:1.11.754
- org.apache.httpcomponents:httpclient:4.5.9
- org.apache.httpcomponents:httpcore:4.4.6

This project bundles the following dependencies under the BSD license.
See bundled license files for details.

- com.google.protobuf:protobuf-java:2.6.1
{quote}

> Add flink-sql-connector-kinesis package
> ---
>
> Key: FLINK-20043
> URL: https://issues.apache.org/jira/browse/FLINK-20043
> Project: Flink
>  Issue Type: Improvement
>Reporter: Alexander Alexandrov
>Assignee: Alexander Alexandrov
>Priority: Major
> Attachments: third-party-report.html
>
>
> This is a follow up item after the recent addition of [Kinesis SQL source and 
> sink in PR #13770 |https://github.com/apache/flink/pull/13770].
> Create a package that bundles a fat connector jar that can be used by SQL 
> clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] 
> and the related PRs for a discussion how to do that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20043) Add flink-sql-connector-kinesis package

2020-11-09 Thread Alexander Alexandrov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228525#comment-17228525
 ] 

Alexander Alexandrov commented on FLINK-20043:
--

{quote}E.g. I read "Do not relocate guava because it is exposed in the Kinesis 
API", this should not apply to a SQL connector because everything is controlled 
by us.
{quote}
Just to clarify, you say that in a {{flink-sql-connector-kinesis}} fat jar we 
can safely relocate Guava code because the {{FlinkKinesisProducer}} (which is 
marked as {{@PublicEvolving}} and depends on some Guava classes) is not meant 
to be used directly by {{flink-sql-connector-kinesis}} clients?
{quote}I would make the decision if we need a dedicated 
{{flink-sql-connector-kinesis}} module up to the content of the current Kinesis 
jar. Does it include Jackson, Guava, JodaTime?
{quote}
 Upon manual inspection of the contents of [the 
{{flink-connector-kinesis_2.12-1.11.2.jar}} found in Maven 
Central|https://search.maven.org/remotecontent?filepath=org/apache/flink/flink-connector-kinesis_2.12/1.11.2/flink-connector-kinesis_2.12-1.11.2.jar]
 I cannot find traces of {{jackson}}, {{joda}}, or {{guava}} in the package 
contents, so I guess the answer is no.

> Add flink-sql-connector-kinesis package
> ---
>
> Key: FLINK-20043
> URL: https://issues.apache.org/jira/browse/FLINK-20043
> Project: Flink
>  Issue Type: Improvement
>Reporter: Alexander Alexandrov
>Assignee: Alexander Alexandrov
>Priority: Major
> Attachments: third-party-report.html
>
>
> This is a follow up item after the recent addition of [Kinesis SQL source and 
> sink in PR #13770 |https://github.com/apache/flink/pull/13770].
> Create a package that bundles a fat connector jar that can be used by SQL 
> clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] 
> and the related PRs for a discussion how to do that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-20043) Add flink-sql-connector-kinesis package

2020-11-09 Thread Alexander Alexandrov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228525#comment-17228525
 ] 

Alexander Alexandrov edited comment on FLINK-20043 at 11/9/20, 11:25 AM:
-

{quote}E.g. I read "Do not relocate guava because it is exposed in the Kinesis 
API", this should not apply to a SQL connector because everything is controlled 
by us.
{quote}
Just to clarify, you say that in a {{flink-sql-connector-kinesis}} fat jar we 
can safely relocate Guava code because the {{FlinkKinesisProducer}} (which is 
marked as {{@PublicEvolving}} and depends on some Guava classes) is not meant 
to be used directly by {{flink-sql-connector-kinesis}} clients?
{quote}I would make the decision if we need a dedicated 
{{flink-sql-connector-kinesis}} module up to the content of the current Kinesis 
jar. Does it include Jackson, Guava, JodaTime?
{quote}
Upon manual inspection of the contents of [the 
{{flink-connector-kinesis_2.12-1.11.2.jar}} found in Maven 
Central|https://search.maven.org/remotecontent?filepath=org/apache/flink/flink-connector-kinesis_2.12/1.11.2/flink-connector-kinesis_2.12-1.11.2.jar]
 I cannot find traces of {{jackson}}, {{joda}}, or {{guava}} in the package 
contents, so I guess the answer is no.


was (Author: aalexandrov):
{quote}E.g. I read "Do not relocate guava because it is exposed in the Kinesis 
API", this should not apply to a SQL connector because everything is controlled 
by us.
{quote}
Just to clarify, you say that in a {{flink-sql-connector-kinesis}} fat jar we 
can safely relocate Guava code because the {{FlinkKinesisProducer}} (which is 
marked as {{@PublicEvolving}} and depends on some Guava classes) is not meant 
to be used directly by {{flink-sql-connector-kinesis}} clients?
{quote}I would make the decision if we need a dedicated 
{{flink-sql-connector-kinesis}} module up to the content of the current Kinesis 
jar. Does it include Jackson, Guava, JodaTime?
{quote}
 Upon manual inspection of the contents of [the 
{{flink-connector-kinesis_2.12-1.11.2.jar}} found in Maven 
Central|https://search.maven.org/remotecontent?filepath=org/apache/flink/flink-connector-kinesis_2.12/1.11.2/flink-connector-kinesis_2.12-1.11.2.jar]
 I cannot find traces of {{jackson}}, {{joda}}, or {{guava}} in the package 
contents, so I guess the answer is no.

> Add flink-sql-connector-kinesis package
> ---
>
> Key: FLINK-20043
> URL: https://issues.apache.org/jira/browse/FLINK-20043
> Project: Flink
>  Issue Type: Improvement
>Reporter: Alexander Alexandrov
>Assignee: Alexander Alexandrov
>Priority: Major
> Attachments: third-party-report.html
>
>
> This is a follow up item after the recent addition of [Kinesis SQL source and 
> sink in PR #13770 |https://github.com/apache/flink/pull/13770].
> Create a package that bundles a fat connector jar that can be used by SQL 
> clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] 
> and the related PRs for a discussion how to do that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-20043) Add flink-sql-connector-kinesis package

2020-11-08 Thread Alexander Alexandrov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227979#comment-17227979
 ] 

Alexander Alexandrov edited comment on FLINK-20043 at 11/8/20, 12:06 PM:
-

BTW, unlike {{flink-connector-kafka}} and 
{{flink-connector-elasticsearch[6|7]}} the Maven jars published by the 
{{flink-connector-kinesis}} package are already fat. In that sense adding a 
separate {{flink-sql-connector-kinesis}} package just to build a fat jar seems 
like unnecessary overhead.

Should I remove [the {{maven-shade-plugin}} configuration in 
{{flink-connector-kinesis}} 
|https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330]as
 part of this change? This breaks the assumptions made of existing clients 
(essentially from 1.12.x if I want the fat jar I will have to pull the 
{{flink-sql-connector-kinesis}} package, even if I am not using the Table API 
or SQL features).


was (Author: aalexandrov):
BTW, unlike {{flink-connector-kafka}} and 
{{flink-connector-elasticsearch[6|7]}} the Maven jars published by the 
{{flink-connector-kinesis}} package are already fat. In that sense adding a 
separate {{flink-sql-connector-kinesis}} package just to build a fat jar seems 
like unnecessary overhead.

Should I remove [the {{maven-shade-plugin}} configuration in 
{{flink-connector-kinesis}} 
|https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330]as
 part of this change? This breaks the assumptions made of existing clients 
(essentially from 1.12.x if I want the fat jar I will have to pull the 
{{flink-sql-connector-kinesis}} package, even if I am not using the Table API 
or SQL features.

> Add flink-sql-connector-kinesis package
> ---
>
> Key: FLINK-20043
> URL: https://issues.apache.org/jira/browse/FLINK-20043
> Project: Flink
>  Issue Type: Improvement
>Reporter: Alexander Alexandrov
>Assignee: Alexander Alexandrov
>Priority: Major
> Attachments: third-party-report.html
>
>
> This is a follow up item after the recent addition of [Kinesis SQL source and 
> sink in PR #13770 |https://github.com/apache/flink/pull/13770].
> Create a package that bundles a fat connector jar that can be used by SQL 
> clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] 
> and the related PRs for a discussion how to do that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-20043) Add flink-sql-connector-kinesis package

2020-11-08 Thread Alexander Alexandrov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227979#comment-17227979
 ] 

Alexander Alexandrov edited comment on FLINK-20043 at 11/8/20, 12:06 PM:
-

BTW, unlike {{flink-connector-kafka}} and 
{{flink-connector-elasticsearch[6|7]}} the Maven jars published by the 
{{flink-connector-kinesis}} package are already fat. In that sense adding a 
separate {{flink-sql-connector-kinesis}} package just to build a fat jar seems 
like unnecessary overhead.

Should I remove [the {{maven-shade-plugin}} configuration in 
{{flink-connector-kinesis}} 
|https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330]as
 part of this change? This breaks the assumptions made of existing clients 
(essentially from 1.12.x if I want the fat jar I will have to pull the 
{{flink-sql-connector-kinesis}} package, even if I am not using the Table API 
or SQL features.


was (Author: aalexandrov):
BTW, unlike {{flink-connector-kafka}} and 
{{flink-connector-elasticsearch[6|7]}} the Maven jars published by the 
{{flink-connector-kinesis}} package are already fat. In that sense adding a 
separate {{flink-sql-connector-kinesis}} package doing the same thing seems 
like unnecessary overhead.

Should I remove [the {{maven-shade-plugin}} configuration in 
{{flink-connector-kinesis}} 
|https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330]as
 part of this change? This breaks the assumptions made of existing clients 
(essentially from 1.12.x if I want the fat jar I will have to pull the 
{{flink-sql-connector-kinesis}} package, even if I am not using the Table API 
or SQL features.

> Add flink-sql-connector-kinesis package
> ---
>
> Key: FLINK-20043
> URL: https://issues.apache.org/jira/browse/FLINK-20043
> Project: Flink
>  Issue Type: Improvement
>Reporter: Alexander Alexandrov
>Assignee: Alexander Alexandrov
>Priority: Major
> Attachments: third-party-report.html
>
>
> This is a follow up item after the recent addition of [Kinesis SQL source and 
> sink in PR #13770 |https://github.com/apache/flink/pull/13770].
> Create a package that bundles a fat connector jar that can be used by SQL 
> clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] 
> and the related PRs for a discussion how to do that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-20043) Add flink-sql-connector-kinesis package

2020-11-08 Thread Alexander Alexandrov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227979#comment-17227979
 ] 

Alexander Alexandrov edited comment on FLINK-20043 at 11/8/20, 12:05 PM:
-

BTW, unlike {{flink-connector-kafka}} and 
{{flink-connector-elasticsearch[6|7]}} the Maven jars published by the 
{{flink-connector-kinesis}} package are already fat. In that sense adding a 
separate {{flink-sql-connector-kinesis}} package doing the same thing seems 
like unnecessary overhead.

Should I remove [the {{maven-shade-plugin}} configuration in 
{{flink-connector-kinesis}} 
|https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330]as
 part of this change? This breaks the assumptions made of existing clients 
(essentially from 1.12.x if I want the fat jar I will have to pull the 
{{flink-sql-connector-kinesis}} package, even if I am not using the Table API 
or SQL features.


was (Author: aalexandrov):
BTW, unlike {{flink-connector-kafka}} and 
\{{flink-connector-elasticsearch[6|7]}}, the Maven jars published by the 
{{flink-connector-kinesis}} package are already fat. In that sense adding a 
separate {{flink-sql-connector-kinesis}} package doing the same thing seems 
like unnecessary overhead.

Should I remove [the {{maven-shade-plugin}} configuration in 
{{flink-connector-kinesis}} 
|https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330]as
 part of this change? This breaks the assumptions made of existing clients 
(essentially from 1.12.x if I want the fat jar I will have to pull the 
{{flink-sql-connector-kinesis}} package, even if I am not using the Table API 
or SQL features.

> Add flink-sql-connector-kinesis package
> ---
>
> Key: FLINK-20043
> URL: https://issues.apache.org/jira/browse/FLINK-20043
> Project: Flink
>  Issue Type: Improvement
>Reporter: Alexander Alexandrov
>Assignee: Alexander Alexandrov
>Priority: Major
> Attachments: third-party-report.html
>
>
> This is a follow up item after the recent addition of [Kinesis SQL source and 
> sink in PR #13770 |https://github.com/apache/flink/pull/13770].
> Create a package that bundles a fat connector jar that can be used by SQL 
> clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] 
> and the related PRs for a discussion how to do that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-20043) Add flink-sql-connector-kinesis package

2020-11-08 Thread Alexander Alexandrov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227979#comment-17227979
 ] 

Alexander Alexandrov edited comment on FLINK-20043 at 11/8/20, 12:05 PM:
-

BTW, unlike {{flink-connector-kafka}} and 
{{flink-connector-elasticsearch{6|7}}}, the Maven jars published by the 
{{flink-connector-kinesis}} package are already fat. In that sense adding a 
separate {{flink-sql-connector-kinesis}} package doing the same thing seems 
like unnecessary overhead.

Should I remove [the {{maven-shade-plugin}} configuration in 
{{flink-connector-kinesis}} 
|https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330]as
 part of this change? This breaks the assumptions made of existing clients 
(essentially from 1.12.x if I want the fat jar I will have to pull the 
{{flink-sql-connector-kinesis}} package, even if I am not using the Table API 
or SQL features.


was (Author: aalexandrov):
BTW the 2.11 and 2.12 jars published by the {{flink-connector-kinesis}} package 
are already fat (they include shaded transitive AWS dependencies), so adding a 
separate {{flink-sql-connector-kinesis}} package that does the same seems like 
an unnecessary overhead. Should I remove the [maven-shade-plugin 
|https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330]
 as part of this change?

> Add flink-sql-connector-kinesis package
> ---
>
> Key: FLINK-20043
> URL: https://issues.apache.org/jira/browse/FLINK-20043
> Project: Flink
>  Issue Type: Improvement
>Reporter: Alexander Alexandrov
>Assignee: Alexander Alexandrov
>Priority: Major
> Attachments: third-party-report.html
>
>
> This is a follow up item after the recent addition of [Kinesis SQL source and 
> sink in PR #13770 |https://github.com/apache/flink/pull/13770].
> Create a package that bundles a fat connector jar that can be used by SQL 
> clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] 
> and the related PRs for a discussion how to do that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-20043) Add flink-sql-connector-kinesis package

2020-11-08 Thread Alexander Alexandrov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227979#comment-17227979
 ] 

Alexander Alexandrov edited comment on FLINK-20043 at 11/8/20, 12:05 PM:
-

BTW, unlike {{flink-connector-kafka}} and 
\{{flink-connector-elasticsearch[6|7]}}, the Maven jars published by the 
{{flink-connector-kinesis}} package are already fat. In that sense adding a 
separate {{flink-sql-connector-kinesis}} package doing the same thing seems 
like unnecessary overhead.

Should I remove [the {{maven-shade-plugin}} configuration in 
{{flink-connector-kinesis}} 
|https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330]as
 part of this change? This breaks the assumptions made of existing clients 
(essentially from 1.12.x if I want the fat jar I will have to pull the 
{{flink-sql-connector-kinesis}} package, even if I am not using the Table API 
or SQL features.


was (Author: aalexandrov):
BTW, unlike {{flink-connector-kafka}} and 
{{flink-connector-elasticsearch{6|7}}}, the Maven jars published by the 
{{flink-connector-kinesis}} package are already fat. In that sense adding a 
separate {{flink-sql-connector-kinesis}} package doing the same thing seems 
like unnecessary overhead.

Should I remove [the {{maven-shade-plugin}} configuration in 
{{flink-connector-kinesis}} 
|https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330]as
 part of this change? This breaks the assumptions made of existing clients 
(essentially from 1.12.x if I want the fat jar I will have to pull the 
{{flink-sql-connector-kinesis}} package, even if I am not using the Table API 
or SQL features.

> Add flink-sql-connector-kinesis package
> ---
>
> Key: FLINK-20043
> URL: https://issues.apache.org/jira/browse/FLINK-20043
> Project: Flink
>  Issue Type: Improvement
>Reporter: Alexander Alexandrov
>Assignee: Alexander Alexandrov
>Priority: Major
> Attachments: third-party-report.html
>
>
> This is a follow up item after the recent addition of [Kinesis SQL source and 
> sink in PR #13770 |https://github.com/apache/flink/pull/13770].
> Create a package that bundles a fat connector jar that can be used by SQL 
> clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] 
> and the related PRs for a discussion how to do that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20043) Add flink-sql-connector-kinesis package

2020-11-08 Thread Alexander Alexandrov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227979#comment-17227979
 ] 

Alexander Alexandrov commented on FLINK-20043:
--

BTW the 2.11 and 2.12 jars published by the {{flink-connector-kinesis}} package 
are already fat (they include shaded transitive AWS dependencies), so adding a 
separate {{flink-sql-connector-kinesis}} package that does the same seems like 
an unnecessary overhead. Should I remove the [maven-shade-plugin 
|https://github.com/apache/flink/blob/322a357f96bf60d3a89fd39ab4ae972bb272a758/flink-connectors/flink-connector-kinesis/pom.xml#L256-L330]
 as part of this change?

> Add flink-sql-connector-kinesis package
> ---
>
> Key: FLINK-20043
> URL: https://issues.apache.org/jira/browse/FLINK-20043
> Project: Flink
>  Issue Type: Improvement
>Reporter: Alexander Alexandrov
>Assignee: Alexander Alexandrov
>Priority: Major
> Attachments: third-party-report.html
>
>
> This is a follow up item after the recent addition of [Kinesis SQL source and 
> sink in PR #13770 |https://github.com/apache/flink/pull/13770].
> Create a package that bundles a fat connector jar that can be used by SQL 
> clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] 
> and the related PRs for a discussion how to do that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20043) Add flink-sql-connector-kinesis package

2020-11-08 Thread Alexander Alexandrov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227969#comment-17227969
 ] 

Alexander Alexandrov commented on FLINK-20043:
--

{quote}are we allowed to ship a Kinesis fat jar license-wise?
{quote}

I ran the following command

{code}
mvn license:third-party-report -pl :flink-connector-kinesis_2.11
{code}

The produced [^third-party-report.html] shows that all AWS packages have Apache 
2.0 License. The ones that I am not sure about are 
{{javax.xml.bind:jaxb-api:2.3.1}}, 
{{javax.activation:javax.activation-api:1.2.0}} and 
{{org.javassist:javassist:3.24.0-GA}}, which have GPL variants.

However, {{org.javassist:javassist:3.24.0-GA}} is also a dependency in 
{{flink-sql-connectors-kafka}}. I have to check whether we strictly need the 
first two.


> Add flink-sql-connector-kinesis package
> ---
>
> Key: FLINK-20043
> URL: https://issues.apache.org/jira/browse/FLINK-20043
> Project: Flink
>  Issue Type: Improvement
>Reporter: Alexander Alexandrov
>Assignee: Alexander Alexandrov
>Priority: Major
> Attachments: third-party-report.html
>
>
> This is a follow up item after the recent addition of [Kinesis SQL source and 
> sink in PR #13770 |https://github.com/apache/flink/pull/13770].
> Create a package that bundles a fat connector jar that can be used by SQL 
> clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] 
> and the related PRs for a discussion how to do that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (FLINK-20043) Add flink-sql-connector-kinesis package

2020-11-08 Thread Alexander Alexandrov (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Alexandrov updated FLINK-20043:
-
Attachment: third-party-report.html

> Add flink-sql-connector-kinesis package
> ---
>
> Key: FLINK-20043
> URL: https://issues.apache.org/jira/browse/FLINK-20043
> Project: Flink
>  Issue Type: Improvement
>Reporter: Alexander Alexandrov
>Assignee: Alexander Alexandrov
>Priority: Major
> Attachments: third-party-report.html
>
>
> This is a follow up item after the recent addition of [Kinesis SQL source and 
> sink in PR #13770 |https://github.com/apache/flink/pull/13770].
> Create a package that bundles a fat connector jar that can be used by SQL 
> clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] 
> and the related PRs for a discussion how to do that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-20043) Add flink-sql-connector-kinesis package

2020-11-07 Thread Alexander Alexandrov (Jira)
Alexander Alexandrov created FLINK-20043:


 Summary: Add flink-sql-connector-kinesis package
 Key: FLINK-20043
 URL: https://issues.apache.org/jira/browse/FLINK-20043
 Project: Flink
  Issue Type: Improvement
Reporter: Alexander Alexandrov


This is a follow up item after the recent addition of [Kinesis SQL source and 
sink in PR #13770 |https://github.com/apache/flink/pull/13770].

Create a package that bundles a fat connector jar that can be used by SQL 
clients. See [FLINK-11026|https://issues.apache.org/jira/browse/FLINK-11026] 
and the related PRs for a discussion how to do that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-20042) Add end-to-end tests for Kinesis Table sources and sinks

2020-11-07 Thread Alexander Alexandrov (Jira)
Alexander Alexandrov created FLINK-20042:


 Summary: Add end-to-end tests for Kinesis Table sources and sinks
 Key: FLINK-20042
 URL: https://issues.apache.org/jira/browse/FLINK-20042
 Project: Flink
  Issue Type: Test
  Components: Connectors / Kinesis, Table SQL / Ecosystem
Reporter: Alexander Alexandrov


Follow-up issue to add end-to-end tests for the recently added 
{{KinesisDynamicSource}} and {{KinesisDynamicSink}}. See [the discussion in PR 
#13770| https://github.com/apache/flink/pull/13770] for details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18858) Kinesis Flink SQL Connector

2020-10-15 Thread Alexander Alexandrov (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214786#comment-17214786
 ] 

Alexander Alexandrov commented on FLINK-18858:
--

Hello [~rmetzger], I picked this up from [~danny.cranmer], could you please 
re-assign this to me?

If everything goes smoothly I should be able to open a PR early next week.

> Kinesis Flink SQL Connector
> ---
>
> Key: FLINK-18858
> URL: https://issues.apache.org/jira/browse/FLINK-18858
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Kinesis, Table SQL / Ecosystem
>Reporter: Waldemar Hummer
>Assignee: Danny Cranmer
>Priority: Major
>
> Hi all,
> as far as I can see in the [list of 
> connectors|https://github.com/apache/flink/tree/master/flink-connectors], we 
> have a 
> {{[flink-connector-kinesis|https://github.com/apache/flink/tree/master/flink-connectors/flink-connector-kinesis]}}
>  for *programmatic access* to Kinesis streams, but there does not yet seem to 
> exist a *Kinesis SQL connector* (something like 
> {{flink-sql-connector-kinesis}}, analogous to {{flink-sql-connector-kafka}}).
> Our use case would be to enable SQL queries with direct access to Kinesis 
> sources (and potentially sinks), to enable something like the following Flink 
> SQL queries:
> {code:java}
>  $ bin/sql-client.sh embedded
> ...
> Flink SQL> CREATE TABLE Orders(`user` string, amount int, rowtime TIME) WITH 
> ('connector' = 'kinesis', ...);
> ...
> Flink SQL> SELECT * FROM Orders ...;
> ...{code}
>  
> I was wondering if this is something that has been considered, or is already 
> actively being worked on? If one of you can provide some guidance, we may be 
> able to work on a PoC implementation to add this functionality.
>  
> (Wasn't able to find an existing issue in the backlog - if this is a 
> duplicate, then please let me know as well.)
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (FLINK-4783) Allow to register TypeInfoFactories manually

2016-10-10 Thread Alexander Alexandrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Alexandrov reassigned FLINK-4783:
---

Assignee: Alexander Alexandrov

> Allow to register TypeInfoFactories manually
> 
>
> Key: FLINK-4783
> URL: https://issues.apache.org/jira/browse/FLINK-4783
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.2.0
>Reporter: Till Rohrmann
>Assignee: Alexander Alexandrov
>Priority: Minor
> Fix For: 1.2.0
>
>
> The newly introduced {{TypeInfoFactories}} (FLINK-3042 and FLINK-3060) allow 
> to create {{TypeInformations}} for types which are annotated with 
> {{TypeInfo}}. This is useful if the user has control over the type for which 
> he wants to generate the {{TypeInformation}}.
> However, annotating a type is not always possible if the type comes from an 
> external library. In this case, it would be good to be able to directly 
> register a {{TypeInfoFactory}} without having to annotate the type.
> The {{TypeExtractor#registerFactory}} already has such a method. However, it 
> is declared private.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-4783) Allow to register TypeInfoFactories manually

2016-10-10 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562288#comment-15562288
 ] 

Alexander Alexandrov edited comment on FLINK-4783 at 10/10/16 1:18 PM:
---

Alright, I guess keeping {{TypeExtractor#registerFactory}} static makes sense, 
but it still does not solve the problem with types from an external library.

Maybe delegating through {{ExecutionConfig}} as suggested by [~till.rohrmann] 
is a cleaner way that will provide means to throw an exception if the 
registration at a wrong position in the code.
This will also be on par with the similar method which already exists for 
registering Kryo serializers.

{code:java}
env.getConfig().registerTypeInfoFactory(Class type, Class> factory)
{code}


was (Author: aalexandrov):
Alright, I guess keeping {{TypeExtractor#registerFactory}} static makes sense, 
but it still does not solve the problem with types from an external library.

Maybe delegating through {{ExecutionConfig}} as suggested by [~till.rohrmann] 
is a cleaner way that will provide means to throw an exception if the 
registration at a wrong position in the code.
This will also be on par with the similar method which already exists for 
registering Kryo serializers.

{code:scala}
env.getConfig().registerTypeInfoFactory(Class type, Class> factory)
{code:scala}

> Allow to register TypeInfoFactories manually
> 
>
> Key: FLINK-4783
> URL: https://issues.apache.org/jira/browse/FLINK-4783
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.2.0
>Reporter: Till Rohrmann
>Priority: Minor
> Fix For: 1.2.0
>
>
> The newly introduced {{TypeInfoFactories}} (FLINK-3042 and FLINK-3060) allow 
> to create {{TypeInformations}} for types which are annotated with 
> {{TypeInfo}}. This is useful if the user has control over the type for which 
> he wants to generate the {{TypeInformation}}.
> However, annotating a type is not always possible if the type comes from an 
> external library. In this case, it would be good to be able to directly 
> register a {{TypeInfoFactory}} without having to annotate the type.
> The {{TypeExtractor#registerFactory}} already has such a method. However, it 
> is declared private.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4783) Allow to register TypeInfoFactories manually

2016-10-10 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15562288#comment-15562288
 ] 

Alexander Alexandrov commented on FLINK-4783:
-

Alright, I guess keeping {{TypeExtractor#registerFactory}} static makes sense, 
but it still does not solve the problem with types from an external library.

Maybe delegating through {{ExecutionConfig}} as suggested by [~till.rohrmann] 
is a cleaner way that will provide means to throw an exception if the 
registration at a wrong position in the code.
This will also be on par with the similar method which already exists for 
registering Kryo serializers.

{code:scala}
env.getConfig().registerTypeInfoFactory(Class type, Class> factory)
{code:scala}

> Allow to register TypeInfoFactories manually
> 
>
> Key: FLINK-4783
> URL: https://issues.apache.org/jira/browse/FLINK-4783
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Affects Versions: 1.2.0
>Reporter: Till Rohrmann
>Priority: Minor
> Fix For: 1.2.0
>
>
> The newly introduced {{TypeInfoFactories}} (FLINK-3042 and FLINK-3060) allow 
> to create {{TypeInformations}} for types which are annotated with 
> {{TypeInfo}}. This is useful if the user has control over the type for which 
> he wants to generate the {{TypeInformation}}.
> However, annotating a type is not always possible if the type comes from an 
> external library. In this case, it would be good to be able to directly 
> register a {{TypeInfoFactory}} without having to annotate the type.
> The {{TypeExtractor#registerFactory}} already has such a method. However, it 
> is declared private.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2858) Cannot build Flink Scala 2.11 with IntelliJ

2015-10-17 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961846#comment-14961846
 ] 

Alexander Alexandrov commented on FLINK-2858:
-

I like the idea for a marker-based activation suggested here:

http://stackoverflow.com/a/8391313

What do you think?

> Cannot build Flink Scala 2.11 with IntelliJ
> ---
>
> Key: FLINK-2858
> URL: https://issues.apache.org/jira/browse/FLINK-2858
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 0.10
>Reporter: Till Rohrmann
>
> If I activate the scala-2.11 profile from within IntelliJ (and thus 
> deactivate the scala-2.10 profile) in order to build Flink with Scala 2.11, 
> then Flink cannot be built. The problem is that some Scala macros cannot be 
> expanded because they were compiled with the wrong version (I assume 2.10).
> This makes debugging tests with Scala 2.11 in IntelliJ impossible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-2858) Cannot build Flink Scala 2.11 with IntelliJ

2015-10-15 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959689#comment-14959689
 ] 

Alexander Alexandrov edited comment on FLINK-2858 at 10/15/15 10:29 PM:


[~trohrm...@apache.org] After trying around for a while I managed to overcome 
the issue. Here are the required steps

* Run the `tools/change-scala-version.sh` script
* Run `mvn clean` from the console
* Optionally, forcefully activate the `scala-2.11` profile & deactivate 
`scala-2.10` from the IntelliJ Maven panel. Strictly speeking this is not 
needed because after running the script the default profile is changed to 
`scala-2.11`.
* In IntelliJ, click "Generate Sources and Update Folders for all Projects" in 
the same panel (but maybe a simple "Reimport all Maven projects" will be enough)
* In IntelliJ, run "Build > Rebuild Project"

This worked for me on the latest IntelliJ Ultimate (14.1.5).


was (Author: aalexandrov):
[~trohrm...@apache.org] After trying around for a while I managed to overcome 
the issue. Here are the steps

* Run the `tools/change-scala-version.sh` script
* Run `mvn clean` from the console
* Optionally, forcefully activate the `scala-2.11` profile & deactivate 
`scala-2.10` from the IntelliJ Maven panel. Strictly speeking this is not 
needed because after running the script the default profile is changed to 
`scala-2.11`.
* In IntelliJ, click "Generate Sources and Update Folders for all Projects" in 
the same panel (but maybe a simple "Reimport all Maven projects" will be enough)
* In IntelliJ, run "Build > Rebuild Project"

This worked for me on the latest IntelliJ Ultimate (14.1.5).

> Cannot build Flink Scala 2.11 with IntelliJ
> ---
>
> Key: FLINK-2858
> URL: https://issues.apache.org/jira/browse/FLINK-2858
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 0.10
>Reporter: Till Rohrmann
>
> If I activate the scala-2.11 profile from within IntelliJ (and thus 
> deactivate the scala-2.10 profile) in order to build Flink with Scala 2.11, 
> then Flink cannot be built. The problem is that some Scala macros cannot be 
> expanded because they were compiled with the wrong version (I assume 2.10).
> This makes debugging tests with Scala 2.11 in IntelliJ impossible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-2858) Cannot build Flink Scala 2.11 with IntelliJ

2015-10-15 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959689#comment-14959689
 ] 

Alexander Alexandrov edited comment on FLINK-2858 at 10/15/15 10:28 PM:


[~trohrm...@apache.org] After trying around for a big I managed to overcome the 
issue. Here are the steps

* Run the `tools/change-scala-version.sh` script
* Run `mvn clean` from the console
* Optionally, forcefully activate the `scala-2.11` profile & deactivate 
`scala-2.10` from the IntelliJ Maven panel. Strictly speeking this is not 
needed because after running the script the default profile is changed to 
`scala-2.11`.
* In IntelliJ, click "Generate Sources and Update Folders for all Projects" in 
the same panel (but maybe a simple "Reimport all Maven projects" will be enough)
* In IntelliJ, run "Build > Rebuild Project"

This worked for me on the latest IntelliJ Ultimate (14.1.5).


was (Author: aalexandrov):
[~trohrm...@apache.org] After trying around for a big I managed to overcome the 
issue. Here are the steps

* Run the `tools/change-scala-version.sh` script
* Run `mvn clean` from the console
* Optionally, forcefully activate the `scala-2.11` profile & deactivate 
`scala-2.10` from the IntelliJ Maven panel. Strictly speeking this is not 
needed because after running the script the default profile is changed to 
`scala-2.11`.
* Click "Generate Sources and Update Folders for all Projects" in the same 
panel (but maybe a simple "Reimport all Maven projects" will be enough)
* Run "Build > Rebuild Project"

This worked for me on the latest IntelliJ Ultimate (14.1.5).

> Cannot build Flink Scala 2.11 with IntelliJ
> ---
>
> Key: FLINK-2858
> URL: https://issues.apache.org/jira/browse/FLINK-2858
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 0.10
>Reporter: Till Rohrmann
>
> If I activate the scala-2.11 profile from within IntelliJ (and thus 
> deactivate the scala-2.10 profile) in order to build Flink with Scala 2.11, 
> then Flink cannot be built. The problem is that some Scala macros cannot be 
> expanded because they were compiled with the wrong version (I assume 2.10).
> This makes debugging tests with Scala 2.11 in IntelliJ impossible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2858) Cannot build Flink Scala 2.11 with IntelliJ

2015-10-15 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959750#comment-14959750
 ] 

Alexander Alexandrov commented on FLINK-2858:
-

I think that the current build description in the docs does not properly 
reflect what needs to be done. 

All you need to do is run the `tools/change-scala-version.sh` script, the 
default Scala profile is then automatically changed.

I tried to update this in [PR 1260|https://github.com/apache/flink/pull/1260].

> Cannot build Flink Scala 2.11 with IntelliJ
> ---
>
> Key: FLINK-2858
> URL: https://issues.apache.org/jira/browse/FLINK-2858
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 0.10
>Reporter: Till Rohrmann
>
> If I activate the scala-2.11 profile from within IntelliJ (and thus 
> deactivate the scala-2.10 profile) in order to build Flink with Scala 2.11, 
> then Flink cannot be built. The problem is that some Scala macros cannot be 
> expanded because they were compiled with the wrong version (I assume 2.10).
> This makes debugging tests with Scala 2.11 in IntelliJ impossible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2858) Cannot build Flink Scala 2.11 with IntelliJ

2015-10-15 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959689#comment-14959689
 ] 

Alexander Alexandrov commented on FLINK-2858:
-

[~trohrm...@apache.org] After trying around for a big I managed to overcome the 
issue. Here are the steps

* Run the `tools/change-scala-version.sh` script
* Run `mvn clean` from the console
* Activate the `scala-2.11` profile & deactivate `scala-2.10` from the IntelliJ 
Maven panel
* Click "Generate Sources and Update Folders for all Projects" in the same 
panel (but maybe a simple "Reimport all Maven projects" will be enough)
* Run "Build > Rebuild Project"

This worked for me on the latest IntelliJ Ultimate (14.1.5).

> Cannot build Flink Scala 2.11 with IntelliJ
> ---
>
> Key: FLINK-2858
> URL: https://issues.apache.org/jira/browse/FLINK-2858
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 0.10
>Reporter: Till Rohrmann
>
> If I activate the scala-2.11 profile from within IntelliJ (and thus 
> deactivate the scala-2.10 profile) in order to build Flink with Scala 2.11, 
> then Flink cannot be built. The problem is that some Scala macros cannot be 
> expanded because they were compiled with the wrong version (I assume 2.10).
> This makes debugging tests with Scala 2.11 in IntelliJ impossible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-2858) Cannot build Flink Scala 2.11 with IntelliJ

2015-10-15 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959689#comment-14959689
 ] 

Alexander Alexandrov edited comment on FLINK-2858 at 10/15/15 10:27 PM:


[~trohrm...@apache.org] After trying around for a big I managed to overcome the 
issue. Here are the steps

* Run the `tools/change-scala-version.sh` script
* Run `mvn clean` from the console
* Optionally, forcefully activate the `scala-2.11` profile & deactivate 
`scala-2.10` from the IntelliJ Maven panel. Strictly speeking this is not 
needed because after running the script the default profile is changed to 
`scala-2.11`.
* Click "Generate Sources and Update Folders for all Projects" in the same 
panel (but maybe a simple "Reimport all Maven projects" will be enough)
* Run "Build > Rebuild Project"

This worked for me on the latest IntelliJ Ultimate (14.1.5).


was (Author: aalexandrov):
[~trohrm...@apache.org] After trying around for a big I managed to overcome the 
issue. Here are the steps

* Run the `tools/change-scala-version.sh` script
* Run `mvn clean` from the console
* Activate the `scala-2.11` profile & deactivate `scala-2.10` from the IntelliJ 
Maven panel
* Click "Generate Sources and Update Folders for all Projects" in the same 
panel (but maybe a simple "Reimport all Maven projects" will be enough)
* Run "Build > Rebuild Project"

This worked for me on the latest IntelliJ Ultimate (14.1.5).

> Cannot build Flink Scala 2.11 with IntelliJ
> ---
>
> Key: FLINK-2858
> URL: https://issues.apache.org/jira/browse/FLINK-2858
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 0.10
>Reporter: Till Rohrmann
>
> If I activate the scala-2.11 profile from within IntelliJ (and thus 
> deactivate the scala-2.10 profile) in order to build Flink with Scala 2.11, 
> then Flink cannot be built. The problem is that some Scala macros cannot be 
> expanded because they were compiled with the wrong version (I assume 2.10).
> This makes debugging tests with Scala 2.11 in IntelliJ impossible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-2858) Cannot build Flink Scala 2.11 with IntelliJ

2015-10-15 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959689#comment-14959689
 ] 

Alexander Alexandrov edited comment on FLINK-2858 at 10/15/15 10:30 PM:


[~trohrm...@apache.org] After trying around for a while I managed to overcome 
the issue. Here are the required steps

* Run the {{tools/change-scala-version.sh}} script
* Run {{mvn clean}} from the console
* Optionally, forcefully activate the {{scala-2.11}} profile & deactivate 
{{scala-2.10}} from the IntelliJ Maven panel. Strictly speeking this is not 
needed because after running the script the default profile is changed to 
{{scala-2.11}}.
* In IntelliJ, click "Generate Sources and Update Folders for all Projects" in 
the same panel (but maybe a simple "Reimport all Maven projects" will be enough)
* In IntelliJ, run "Build > Rebuild Project"

This worked for me on the latest IntelliJ Ultimate (14.1.5).


was (Author: aalexandrov):
[~trohrm...@apache.org] After trying around for a while I managed to overcome 
the issue. Here are the required steps

* Run the `tools/change-scala-version.sh` script
* Run `mvn clean` from the console
* Optionally, forcefully activate the `scala-2.11` profile & deactivate 
`scala-2.10` from the IntelliJ Maven panel. Strictly speeking this is not 
needed because after running the script the default profile is changed to 
`scala-2.11`.
* In IntelliJ, click "Generate Sources and Update Folders for all Projects" in 
the same panel (but maybe a simple "Reimport all Maven projects" will be enough)
* In IntelliJ, run "Build > Rebuild Project"

This worked for me on the latest IntelliJ Ultimate (14.1.5).

> Cannot build Flink Scala 2.11 with IntelliJ
> ---
>
> Key: FLINK-2858
> URL: https://issues.apache.org/jira/browse/FLINK-2858
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 0.10
>Reporter: Till Rohrmann
>
> If I activate the scala-2.11 profile from within IntelliJ (and thus 
> deactivate the scala-2.10 profile) in order to build Flink with Scala 2.11, 
> then Flink cannot be built. The problem is that some Scala macros cannot be 
> expanded because they were compiled with the wrong version (I assume 2.10).
> This makes debugging tests with Scala 2.11 in IntelliJ impossible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-2858) Cannot build Flink Scala 2.11 with IntelliJ

2015-10-15 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959689#comment-14959689
 ] 

Alexander Alexandrov edited comment on FLINK-2858 at 10/15/15 10:29 PM:


[~trohrm...@apache.org] After trying around for a while I managed to overcome 
the issue. Here are the steps

* Run the `tools/change-scala-version.sh` script
* Run `mvn clean` from the console
* Optionally, forcefully activate the `scala-2.11` profile & deactivate 
`scala-2.10` from the IntelliJ Maven panel. Strictly speeking this is not 
needed because after running the script the default profile is changed to 
`scala-2.11`.
* In IntelliJ, click "Generate Sources and Update Folders for all Projects" in 
the same panel (but maybe a simple "Reimport all Maven projects" will be enough)
* In IntelliJ, run "Build > Rebuild Project"

This worked for me on the latest IntelliJ Ultimate (14.1.5).


was (Author: aalexandrov):
[~trohrm...@apache.org] After trying around for a big I managed to overcome the 
issue. Here are the steps

* Run the `tools/change-scala-version.sh` script
* Run `mvn clean` from the console
* Optionally, forcefully activate the `scala-2.11` profile & deactivate 
`scala-2.10` from the IntelliJ Maven panel. Strictly speeking this is not 
needed because after running the script the default profile is changed to 
`scala-2.11`.
* In IntelliJ, click "Generate Sources and Update Folders for all Projects" in 
the same panel (but maybe a simple "Reimport all Maven projects" will be enough)
* In IntelliJ, run "Build > Rebuild Project"

This worked for me on the latest IntelliJ Ultimate (14.1.5).

> Cannot build Flink Scala 2.11 with IntelliJ
> ---
>
> Key: FLINK-2858
> URL: https://issues.apache.org/jira/browse/FLINK-2858
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 0.10
>Reporter: Till Rohrmann
>
> If I activate the scala-2.11 profile from within IntelliJ (and thus 
> deactivate the scala-2.10 profile) in order to build Flink with Scala 2.11, 
> then Flink cannot be built. The problem is that some Scala macros cannot be 
> expanded because they were compiled with the wrong version (I assume 2.10).
> This makes debugging tests with Scala 2.11 in IntelliJ impossible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2858) Cannot build Flink Scala 2.11 with IntelliJ

2015-10-15 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959761#comment-14959761
 ] 

Alexander Alexandrov commented on FLINK-2858:
-

I think that it also makes sense to place a comment on the scala-2.11 profiles 
which indicate that these are not meant to be overridden by the user. 

We can also try to pull the {{scala.binary.version}} property out to the 
general {{properties}} section and rework the profiles so they activate 
themselves based on the current {{scala.binary.version}} value, but AFAIR 
[~rmetzger] mentioned once that property-based activation is not really 
supported by Maven at the moment.

> Cannot build Flink Scala 2.11 with IntelliJ
> ---
>
> Key: FLINK-2858
> URL: https://issues.apache.org/jira/browse/FLINK-2858
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 0.10
>Reporter: Till Rohrmann
>
> If I activate the scala-2.11 profile from within IntelliJ (and thus 
> deactivate the scala-2.10 profile) in order to build Flink with Scala 2.11, 
> then Flink cannot be built. The problem is that some Scala macros cannot be 
> expanded because they were compiled with the wrong version (I assume 2.10).
> This makes debugging tests with Scala 2.11 in IntelliJ impossible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-2311) Set flink-* dependencies in flink-contrib as provided

2015-07-02 Thread Alexander Alexandrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Alexandrov updated FLINK-2311:

External issue URL: https://github.com/apache/flink/pull/880
 External issue ID: 880

 Set flink-* dependencies in flink-contrib as provided
 ---

 Key: FLINK-2311
 URL: https://issues.apache.org/jira/browse/FLINK-2311
 Project: Flink
  Issue Type: Improvement
  Components: flink-contrib
Affects Versions: 0.10, 0.9.1
Reporter: Alexander Alexandrov
Assignee: Alexander Alexandrov
Priority: Minor
  Labels: easyfix, maven, patch
 Fix For: 0.10, 0.9.1


 The {{flink-contrib}} folder is assumed to be provided by the user. As such, 
 other {{flink-*}} dependencies referenced within {{flink-contrib}} should be 
 set as _'provided'_ in order to keep the size of the user jars down. I'm 
 currently testing a patch that changes the poms as suggested and will open a 
 PR on GitHub if this everything passes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2311) Set flink-* dependencies in flink-contrib as provided

2015-07-02 Thread Alexander Alexandrov (JIRA)
Alexander Alexandrov created FLINK-2311:
---

 Summary: Set flink-* dependencies in flink-contrib as provided
 Key: FLINK-2311
 URL: https://issues.apache.org/jira/browse/FLINK-2311
 Project: Flink
  Issue Type: Improvement
  Components: flink-contrib
Affects Versions: 0.10, 0.9.1
Reporter: Alexander Alexandrov
Assignee: Alexander Alexandrov
Priority: Minor
 Fix For: 0.10, 0.9.1


The {{flink-contrib}} folder is assumed to be provided by the user. As such, 
other {{flink-*}} dependencies referenced within {{flink-contrib}} should be 
set as _'provided'_ in order to keep the size of the user jars down. I'm 
currently testing a patch that changes the poms as suggested and will open a PR 
on GitHub if this everything passes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2237) Add hash-based Aggregation

2015-06-17 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590727#comment-14590727
 ] 

Alexander Alexandrov commented on FLINK-2237:
-

In the Flink RT, operators are represented in the `PactDriver` family. I 
suggest to proceed as follows

* read the code of an existing aggregate driver implementation (e.g. the 
`GroupReduceCombineDriver`);
* use this as a starting point: create a copy of the class, and adapt it as a a 
first sketch of your hash-based driver;
* implement a test for your driver (you can use the corresponding 
`GroupReduceCombineDriver` test);
* push the sketch in a repository and summarize your attempt here.

We can then go over it and give you more concrete feedback.

 Add hash-based Aggregation
 --

 Key: FLINK-2237
 URL: https://issues.apache.org/jira/browse/FLINK-2237
 Project: Flink
  Issue Type: New Feature
Reporter: Rafiullah Momand
Priority: Minor
  Labels: github-import
 Fix For: pre-apache


 Aggregation functions at the moment are implemented in a sort-based way.
 How can we implement hash based Aggregation for Flink?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-2231) Create a Serializer for Scala Enumerations

2015-06-17 Thread Alexander Alexandrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Alexandrov reassigned FLINK-2231:
---

Assignee: Alexander Alexandrov

 Create a Serializer for Scala Enumerations
 --

 Key: FLINK-2231
 URL: https://issues.apache.org/jira/browse/FLINK-2231
 Project: Flink
  Issue Type: Improvement
  Components: Scala API
Reporter: Stephan Ewen
Assignee: Alexander Alexandrov

 Scala Enumerations are currently serialized with Kryo, but should be 
 efficiently serialized by just writing the {{initial}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2231) Create a Serializer for Scala Enumerations

2015-06-17 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589814#comment-14589814
 ] 

Alexander Alexandrov commented on FLINK-2231:
-

If the point of the ticket is to extend the TypeInformation synthesis macro I 
think I can handle that these days.

[~StephanEwen] what do you mean by initial?

 Create a Serializer for Scala Enumerations
 --

 Key: FLINK-2231
 URL: https://issues.apache.org/jira/browse/FLINK-2231
 Project: Flink
  Issue Type: Improvement
  Components: Scala API
Reporter: Stephan Ewen
Assignee: Alexander Alexandrov

 Scala Enumerations are currently serialized with Kryo, but should be 
 efficiently serialized by just writing the {{initial}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-05-31 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566564#comment-14566564
 ] 

Alexander Alexandrov commented on FLINK-1731:
-

[~till.rohrmann] I coudn't find it eather. I think we were discussing to do the 
K-Means|| as a separate issue.

Florian Gößler also reported the following issue when he tried to rebase

{{{
Error:(200, 75) ambiguous implicit values:
 both value denseVectorConverter in object BreezeVectorConverter of type = 
org.apache.flink.ml.math.BreezeVectorConverter[org.apache.flink.ml.math.DenseVector]
 and value sparseVectorConverter in object BreezeVectorConverter of type = 
org.apache.flink.ml.math.BreezeVectorConverter[org.apache.flink.ml.math.SparseVector]
 match expected type org.apache.flink.ml.math.BreezeVectorConverter[T]
.reduce((p1, p2) = (p1._1, (p1._2.asBreeze + 
p2._2.asBreeze).fromBreeze, p1._3 + p2._3))
}}}

Any idea what might be the cause?

 Add kMeans clustering algorithm to machine learning library
 ---

 Key: FLINK-1731
 URL: https://issues.apache.org/jira/browse/FLINK-1731
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Peter Schrott
  Labels: ML

 The Flink repository already contains a kMeans implementation but it is not 
 yet ported to the machine learning library. I assume that only the used data 
 types have to be adapted and then it can be more or less directly moved to 
 flink-ml.
 The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
 implementation because the improve the initial seeding phase to achieve near 
 optimal clustering. It might be worthwhile to implement kMeans||.
 Resources:
 [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
 [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-05-20 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14552113#comment-14552113
 ] 

Alexander Alexandrov commented on FLINK-1731:
-

[~till.rohrmann] Can we merge this with the current set of features and then 
add the automatic picking of the initial centroids in another issue?

 Add kMeans clustering algorithm to machine learning library
 ---

 Key: FLINK-1731
 URL: https://issues.apache.org/jira/browse/FLINK-1731
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Peter Schrott
  Labels: ML

 The Flink repository already contains a kMeans implementation but it is not 
 yet ported to the machine learning library. I assume that only the used data 
 types have to be adapted and then it can be more or less directly moved to 
 flink-ml.
 The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
 implementation because the improve the initial seeding phase to achieve near 
 optimal clustering. It might be worthwhile to implement kMeans||.
 Resources:
 [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
 [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2044) Implementation of Gelly Algorithm HITS

2015-05-19 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550568#comment-14550568
 ] 

Alexander Alexandrov commented on FLINK-2044:
-

The [feature branch can be found 
here|https://github.com/JavidMayar/flink/commits/HITS].

 Implementation of Gelly Algorithm HITS
 --

 Key: FLINK-2044
 URL: https://issues.apache.org/jira/browse/FLINK-2044
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Reporter: Ahamd Javid
Priority: Minor
 Attachments: HitsMain.java, Hits_Class.java


 Implementation of Hits Algorithm in Gelly API using Java 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2023) TypeExtractor does not work for (some) Scala Classes

2015-05-18 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547706#comment-14547706
 ] 

Alexander Alexandrov commented on FLINK-2023:
-

For a lot of operators, Flink's DataSet API provides overloaded method 
signatures where a user can explicitly pass a TypeInformation object. If this 
is not given, Flink falls back to the TypeExtractor logic. 

If you follow a similar pattern in Gelly, you can 

1. wrap the Java API in Scala;
2. synthesize the TypeInformation using the already present Scala macros;
3. pass the synthesized object using the explicit signatures in Gelly.

As a pre-requisite for that, the TypeExtraction logic should be extended in 
order to work with Scala base types.

 TypeExtractor does not work for (some) Scala Classes
 

 Key: FLINK-2023
 URL: https://issues.apache.org/jira/browse/FLINK-2023
 Project: Flink
  Issue Type: Bug
  Components: Type Serialization System
Reporter: Aljoscha Krettek

 [~vanaepi] discovered some problems while working on the Scala Gelly API 
 where, for example, a Scala MapFunction can not be correctly analyzed by the 
 type extractor. For example, generic types will not be correctly detected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2023) TypeExtractor does not work for (some) Scala Classes

2015-05-18 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547764#comment-14547764
 ] 

Alexander Alexandrov commented on FLINK-2023:
-

This was my basic idea, yes. In the example you have given things could be 
probably simplified a bit, as type info related to the input are already 
properly set. In the example above, you just need to pass the {{returnType}}.

 TypeExtractor does not work for (some) Scala Classes
 

 Key: FLINK-2023
 URL: https://issues.apache.org/jira/browse/FLINK-2023
 Project: Flink
  Issue Type: Bug
  Components: Type Serialization System
Reporter: Aljoscha Krettek

 [~vanaepi] discovered some problems while working on the Scala Gelly API 
 where, for example, a Scala MapFunction can not be correctly analyzed by the 
 type extractor. For example, generic types will not be correctly detected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2023) TypeExtractor does not work for (some) Scala Classes

2015-05-17 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547105#comment-14547105
 ] 

Alexander Alexandrov commented on FLINK-2023:
-

I suggest to have a thin layer which implicitly provides the TypeInformations 
over the Gelly Java API in Scala (similar to what [~aljoscha] has done with the 
DataSet Scala API).

 TypeExtractor does not work for (some) Scala Classes
 

 Key: FLINK-2023
 URL: https://issues.apache.org/jira/browse/FLINK-2023
 Project: Flink
  Issue Type: Bug
  Components: Type Serialization System
Reporter: Aljoscha Krettek

 [~vanaepi] discovered some problems while working on the Scala Gelly API 
 where, for example, a Scala MapFunction can not be correctly analyzed by the 
 type extractor. For example, generic types will not be correctly detected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2023) TypeExtractor does not work for (some) Scala Classes

2015-05-16 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546969#comment-14546969
 ] 

Alexander Alexandrov commented on FLINK-2023:
-

I think there is no easy way around this. Generic parameters in Scala classes 
are erased and at the Java level (on which the TypeExtractor operates) you will 
see only {{java.lang.Object}} instances instead. For a more detailed 
description of the problem with some examples check [this Stackoverflow 
entry|http://stackoverflow.com/questions/20749536/why-dont-scala-primitives-show-up-as-type-parameters-in-java-reflection].

 TypeExtractor does not work for (some) Scala Classes
 

 Key: FLINK-2023
 URL: https://issues.apache.org/jira/browse/FLINK-2023
 Project: Flink
  Issue Type: Bug
  Components: Type Serialization System
Reporter: Aljoscha Krettek

 [~vanaepi] discovered some problems while working on the Scala Gelly API 
 where, for example, a Scala MapFunction can not be correctly analyzed by the 
 type extractor. For example, generic types will not be correctly detected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2023) TypeExtractor does not work for (some) Scala Classes

2015-05-16 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546970#comment-14546970
 ] 

Alexander Alexandrov commented on FLINK-2023:
-

Long story short - you cannot use the Java API directly from Scala (i.e., with 
Scala types) and rely on correct type inference form the TypeExtractor.

 TypeExtractor does not work for (some) Scala Classes
 

 Key: FLINK-2023
 URL: https://issues.apache.org/jira/browse/FLINK-2023
 Project: Flink
  Issue Type: Bug
  Components: Type Serialization System
Reporter: Aljoscha Krettek

 [~vanaepi] discovered some problems while working on the Scala Gelly API 
 where, for example, a Scala MapFunction can not be correctly analyzed by the 
 type extractor. For example, generic types will not be correctly detected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-05-14 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543734#comment-14543734
 ] 

Alexander Alexandrov commented on FLINK-1731:
-

I would go with a {{DataSet}} for the centroids as well. That said, we can 
reduce syntax at the client side by providing either

- an implicit converter that {{Seq\[A\] = DataSet\[A\]}} (needs to be part of 
the Flink Scala API, could be already there), or
- an overloaded {{setCentroids(Seq\[A\])}} setter.

 Add kMeans clustering algorithm to machine learning library
 ---

 Key: FLINK-1731
 URL: https://issues.apache.org/jira/browse/FLINK-1731
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Peter Schrott
  Labels: ML

 The Flink repository already contains a kMeans implementation but it is not 
 yet ported to the machine learning library. I assume that only the used data 
 types have to be adapted and then it can be more or less directly moved to 
 flink-ml.
 The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
 implementation because the improve the initial seeding phase to achieve near 
 optimal clustering. It might be worthwhile to implement kMeans||.
 Resources:
 [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
 [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1743) Add multinomial logistic regression to machine learning library

2015-05-14 Thread Alexander Alexandrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Alexandrov updated FLINK-1743:

Assignee: (was: Alexander Alexandrov)

 Add multinomial logistic regression to machine learning library
 ---

 Key: FLINK-1743
 URL: https://issues.apache.org/jira/browse/FLINK-1743
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
  Labels: ML

 Multinomial logistic regression [1] would be good first classification 
 algorithm which can classify multiple classes. 
 Resources:
 [1] [http://en.wikipedia.org/wiki/Multinomial_logistic_regression]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-05-14 Thread Alexander Alexandrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Alexandrov updated FLINK-1731:

Assignee: (was: Alexander Alexandrov)

 Add kMeans clustering algorithm to machine learning library
 ---

 Key: FLINK-1731
 URL: https://issues.apache.org/jira/browse/FLINK-1731
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
  Labels: ML

 The Flink repository already contains a kMeans implementation but it is not 
 yet ported to the machine learning library. I assume that only the used data 
 types have to be adapted and then it can be more or less directly moved to 
 flink-ml.
 The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
 implementation because the improve the initial seeding phase to achieve near 
 optimal clustering. It might be worthwhile to implement kMeans||.
 Resources:
 [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
 [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-05-14 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543643#comment-14543643
 ] 

Alexander Alexandrov commented on FLINK-1731:
-

[~peedeeX21] for some reason I cannot assign this to you directly. I cleared 
the assignee field so you can assign the issue to yourself. 

 Add kMeans clustering algorithm to machine learning library
 ---

 Key: FLINK-1731
 URL: https://issues.apache.org/jira/browse/FLINK-1731
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
  Labels: ML

 The Flink repository already contains a kMeans implementation but it is not 
 yet ported to the machine learning library. I assume that only the used data 
 types have to be adapted and then it can be more or less directly moved to 
 flink-ml.
 The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
 implementation because the improve the initial seeding phase to achieve near 
 optimal clustering. It might be worthwhile to implement kMeans||.
 Resources:
 [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
 [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-05-14 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543734#comment-14543734
 ] 

Alexander Alexandrov edited comment on FLINK-1731 at 5/14/15 2:35 PM:
--

I would go with a {{DataSet}} for the centroids as well. That said, we can 
reduce syntax at the client side by providing either

- an overloaded {{setCentroids(Seq\[A\])}} setter, or
- an implicit converter of type {{Seq\[A\] = DataSet\[A\]}} (needs to be part 
of the Flink Scala API, could be already there) which allows to pass a 
{{Seq\[A\]}} argument to a {{setCentroids(DataSet\[A\])}} setter.


was (Author: aalexandrov):
I would go with a {{DataSet}} for the centroids as well. That said, we can 
reduce syntax at the client side by providing either

- an implicit converter that {{Seq\[A\] = DataSet\[A\]}} (needs to be part of 
the Flink Scala API, could be already there), or
- an overloaded {{setCentroids(Seq\[A\])}} setter.

 Add kMeans clustering algorithm to machine learning library
 ---

 Key: FLINK-1731
 URL: https://issues.apache.org/jira/browse/FLINK-1731
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Peter Schrott
  Labels: ML

 The Flink repository already contains a kMeans implementation but it is not 
 yet ported to the machine learning library. I assume that only the used data 
 types have to be adapted and then it can be more or less directly moved to 
 flink-ml.
 The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
 implementation because the improve the initial seeding phase to achieve near 
 optimal clustering. It might be worthwhile to implement kMeans||.
 Resources:
 [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
 [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-05-13 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541570#comment-14541570
 ] 

Alexander Alexandrov commented on FLINK-1731:
-

I suggest to try and add the initial centroids as a proper parameter and not as 
part of the ParameterMap, since they are an actual input to the algorithm (as 
opposed to an algorithm hyper-parameter).

 Add kMeans clustering algorithm to machine learning library
 ---

 Key: FLINK-1731
 URL: https://issues.apache.org/jira/browse/FLINK-1731
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Alexander Alexandrov
  Labels: ML

 The Flink repository already contains a kMeans implementation but it is not 
 yet ported to the machine learning library. I assume that only the used data 
 types have to be adapted and then it can be more or less directly moved to 
 flink-ml.
 The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better 
 implementation because the improve the initial seeding phase to achieve near 
 optimal clustering. It might be worthwhile to implement kMeans||.
 Resources:
 [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
 [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1959) Accumulators BROKEN after Partitioning

2015-05-12 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540146#comment-14540146
 ] 

Alexander Alexandrov commented on FLINK-1959:
-

After some more debugging with [~elbehery] we managed to identify the issue.

Adding {{partitionByHash}} into the pipeline breaks the local chain two and 
creates two tasks. The second task (the receiver of the partition) starts with 
a RegularPactTask which runs a NoOpDriver and does not have a UDF stub. Because 
of that, the test at [line 
511|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/operators/RegularPactTask.java#L511]
 and the accumulators are never called. 

@[~StephanEwen]: Unless you see a problem with that solution, I suggest to 
remove the {{stub != null}} check at [line 
511|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/operators/RegularPactTask.java#L511]
 and report accumulators always.

Regards,
Alexander


 Accumulators BROKEN after Partitioning
 --

 Key: FLINK-1959
 URL: https://issues.apache.org/jira/browse/FLINK-1959
 Project: Flink
  Issue Type: Bug
  Components: Examples
Affects Versions: master
Reporter: mustafa elbehery
Priority: Critical
 Fix For: master


 while running the Accumulator example in 
 https://github.com/Elbehery/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/EmptyFieldsCountAccumulator.java,
  
 I tried to alter the data flow with PartitionByHash function before 
 applying Filter, and the resulted accumulator was NULL. 
 By Debugging, I could see the accumulator in the RunTime Map. However, by 
 retrieving the accumulator from the JobExecutionResult object, it was NULL. 
 The line caused the problem is file.partitionByHash(1).filter(new 
 EmptyFieldFilter()) instead of file.filter(new EmptyFieldFilter())



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1999) TF-IDF transformer

2015-05-12 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540156#comment-14540156
 ] 

Alexander Alexandrov commented on FLINK-1999:
-

[~vsldimov] K is the Document ID, not a workd from the document. The 
`SparseVector` contains the transformed bag of words model for the document 
(originally stored in the point value of type Seq[String] in the input).

 TF-IDF transformer
 --

 Key: FLINK-1999
 URL: https://issues.apache.org/jira/browse/FLINK-1999
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Ronny Bräunlich
Assignee: Alexander Alexandrov
Priority: Minor
  Labels: ML

 Hello everybody,
 we are a group of three students from TU Berlin (I guess we're not the first 
 group creating an issue) and we want to/have to implement a tf-idf tranformer 
 for Flink.
 Our lecturer Alexander told us that we could get some guidance here and that 
 you could point us to an old version of a similar tranformer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1999) TF-IDF transformer

2015-05-12 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539539#comment-14539539
 ] 

Alexander Alexandrov commented on FLINK-1999:
-

I'm not sure whether Seq[String] is enough as a type. In particular, I am not 
seeing how you can then classify the original items, as the transformed 
`DataSet` does not hold any document ID.

I would therefore suggest to have a more generic type 

{code}
case class Point[K, V](id: K, value: V) {}
{code}

The type of the transformer should then be Point[K, Seq[String]] = Point[K, 
SparseVector[Double]] where the point value is a SparseVector[Double] encodes 
the tf-idf values of the words occurring in the document.

 TF-IDF transformer
 --

 Key: FLINK-1999
 URL: https://issues.apache.org/jira/browse/FLINK-1999
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Ronny Bräunlich
Assignee: Alexander Alexandrov
Priority: Minor
  Labels: ML

 Hello everybody,
 we are a group of three students from TU Berlin (I guess we're not the first 
 group creating an issue) and we want to/have to implement a tf-idf tranformer 
 for Flink.
 Our lecturer Alexander told us that we could get some guidance here and that 
 you could point us to an old version of a similar tranformer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1959) Accumulators BROKEN after Partitioning

2015-05-11 Thread Alexander Alexandrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Alexandrov updated FLINK-1959:

Affects Version/s: (was: 0.8.1)
   master
Fix Version/s: (was: 0.8.1)
   master

 Accumulators BROKEN after Partitioning
 --

 Key: FLINK-1959
 URL: https://issues.apache.org/jira/browse/FLINK-1959
 Project: Flink
  Issue Type: Bug
  Components: Examples
Affects Versions: master
Reporter: mustafa elbehery
Priority: Critical
 Fix For: master


 while running the Accumulator example in 
 https://github.com/Elbehery/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/EmptyFieldsCountAccumulator.java,
  
 I tried to alter the data flow with PartitionByHash function before 
 applying Filter, and the resulted accumulator was NULL. 
 By Debugging, I could see the accumulator in the RunTime Map. However, by 
 retrieving the accumulator from the JobExecutionResult object, it was NULL. 
 The line caused the problem is file.partitionByHash(1).filter(new 
 EmptyFieldFilter()) instead of file.filter(new EmptyFieldFilter())



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1999) TF-IDF transformer

2015-05-11 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537853#comment-14537853
 ] 

Alexander Alexandrov commented on FLINK-1999:
-

What does a Seq[String] represent?

 TF-IDF transformer
 --

 Key: FLINK-1999
 URL: https://issues.apache.org/jira/browse/FLINK-1999
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Ronny Bräunlich
Assignee: Alexander Alexandrov
Priority: Minor
  Labels: ML

 Hello everybody,
 we are a group of three students from TU Berlin (I guess we're not the first 
 group creating an issue) and we want to/have to implement a tf-idf tranformer 
 for Flink.
 Our lecturer Alexander told us that we could get some guidance here and that 
 you could point us to an old version of a similar tranformer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1959) Accumulators BROKEN after Partitioning

2015-05-11 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537845#comment-14537845
 ] 

Alexander Alexandrov commented on FLINK-1959:
-

I think that some communication / traversal chain gets broken in the 
ParitionByHash node.

You can either 

(1) try to dig through the code and see where this happens, or
(2) use an alternative to the accumulator until the issue is resolved (e.g. 
write the information to a pre-defined HDFS path); 

 Accumulators BROKEN after Partitioning
 --

 Key: FLINK-1959
 URL: https://issues.apache.org/jira/browse/FLINK-1959
 Project: Flink
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.8.1
Reporter: mustafa elbehery
Priority: Critical
 Fix For: 0.8.1


 while running the Accumulator example in 
 https://github.com/Elbehery/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/EmptyFieldsCountAccumulator.java,
  
 I tried to alter the data flow with PartitionByHash function before 
 applying Filter, and the resulted accumulator was NULL. 
 By Debugging, I could see the accumulator in the RunTime Map. However, by 
 retrieving the accumulator from the JobExecutionResult object, it was NULL. 
 The line caused the problem is file.partitionByHash(1).filter(new 
 EmptyFieldFilter()) instead of file.filter(new EmptyFieldFilter())



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1999) TF-IDF transformer

2015-05-10 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537358#comment-14537358
 ] 

Alexander Alexandrov commented on FLINK-1999:
-

I agree with [~Felix Neutatz], you can use the PR for the feature hasher as a 
reference.

 TF-IDF transformer
 --

 Key: FLINK-1999
 URL: https://issues.apache.org/jira/browse/FLINK-1999
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Ronny Bräunlich
Assignee: Alexander Alexandrov
Priority: Minor
  Labels: ML

 Hello everybody,
 we are a group of three students from TU Berlin (I guess we're not the first 
 group creating an issue) and we want to/have to implement a tf-idf tranformer 
 for Flink.
 Our lecturer Alexander told us that we could get some guidance here and that 
 you could point us to an old version of a similar tranformer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1848) Paths containing a Windows drive letter cannot be used in FileOutputFormats

2015-04-23 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509282#comment-14509282
 ] 

Alexander Alexandrov commented on FLINK-1848:
-

I think this should be fixed (with the PR) and can be closed now, right?

 Paths containing a Windows drive letter cannot be used in FileOutputFormats
 ---

 Key: FLINK-1848
 URL: https://issues.apache.org/jira/browse/FLINK-1848
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.9
 Environment: Windows (Cygwin and native)
Reporter: Fabian Hueske
Assignee: Fabian Hueske
Priority: Critical
 Fix For: 0.9


 Paths that contain a Windows drive letter such as {{file:///c:/my/directory}} 
 cannot be used as output path for {{FileOutputFormat}}.
 If done, the following exception is thrown:
 {code}
 Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
 Relative path in absolute URI: file:c:
 at org.apache.flink.core.fs.Path.initialize(Path.java:242)
 at org.apache.flink.core.fs.Path.init(Path.java:225)
 at org.apache.flink.core.fs.Path.init(Path.java:138)
 at 
 org.apache.flink.core.fs.local.LocalFileSystem.pathToFile(LocalFileSystem.java:147)
 at 
 org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:232)
 at 
 org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233)
 at 
 org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233)
 at 
 org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233)
 at 
 org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233)
 at 
 org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileSystem.java:603)
 at 
 org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:233)
 at 
 org.apache.flink.api.java.io.CsvOutputFormat.open(CsvOutputFormat.java:158)
 at 
 org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:183)
 at 
 org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
 at java.lang.Thread.run(Unknown Source)
 Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:c:
 at java.net.URI.checkPath(Unknown Source)
 at java.net.URI.init(Unknown Source)
 at org.apache.flink.core.fs.Path.initialize(Path.java:240)
 ... 14 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-1735) Add FeatureHasher to machine learning library

2015-04-21 Thread Alexander Alexandrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Alexandrov reassigned FLINK-1735:
---

Assignee: Alexander Alexandrov

 Add FeatureHasher to machine learning library
 -

 Key: FLINK-1735
 URL: https://issues.apache.org/jira/browse/FLINK-1735
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Alexander Alexandrov
  Labels: ML

 Using the hashing trick [1,2] is a common way to vectorize arbitrary feature 
 values. The hash of the feature value is used to calculate its index for a 
 vector entry. In order to mitigate possible collisions, a second hashing 
 function is used to calculate the sign for the update value which is added to 
 the vector entry. This way, it is likely that collision will simply cancel 
 out.
 A feature hasher would also be helpful for NLP problems where it could be 
 used to vectorize bag of words or ngrams feature vectors.
 Resources:
 [1] [https://en.wikipedia.org/wiki/Feature_hashing]
 [2] 
 [http://scikit-learn.org/stable/modules/feature_extraction.html#feature-extraction]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-1743) Add multinomial logistic regression to machine learning library

2015-04-21 Thread Alexander Alexandrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Alexandrov reassigned FLINK-1743:
---

Assignee: Alexander Alexandrov

 Add multinomial logistic regression to machine learning library
 ---

 Key: FLINK-1743
 URL: https://issues.apache.org/jira/browse/FLINK-1743
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Alexander Alexandrov
  Labels: ML

 Multinomial logistic regression [1] would be good first classification 
 algorithm which can classify multiple classes. 
 Resources:
 [1] [http://en.wikipedia.org/wiki/Multinomial_logistic_regression]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1736) Add CountVectorizer to machine learning library

2015-04-21 Thread Alexander Alexandrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Alexandrov updated FLINK-1736:

Assignee: Alexander Alexandrov

 Add CountVectorizer to machine learning library
 ---

 Key: FLINK-1736
 URL: https://issues.apache.org/jira/browse/FLINK-1736
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Alexander Alexandrov
  Labels: ML

 A {{CountVectorizer}} feature extractor [1] assigns each occurring word in a 
 corpus an unique identifier. With this mapping it can vectorize models such 
 as bag of words or ngrams in a efficient way. The unique identifier assigned 
 to a word acts as the index of a vector. The number of word occurrences is 
 represented as a vector value at a specific index. 
 The advantage of the {{CountVectorizer}} compared to the FeatureHasher is 
 that the mapping of words to indices can be obtained which makes it easier to 
 understand the resulting feature vectors.
 The {{CountVectorizer}} could be generalized to support arbitrary feature 
 values.
 The {{CountVectorizer}} should be implemented as a {{Transfomer}}.
 Resources:
 [1] 
 [http://scikit-learn.org/stable/modules/feature_extraction.html#common-vectorizer-usage]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-1731) Add kMeans clustering algorithm to machine learning library

2015-04-21 Thread Alexander Alexandrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Alexandrov reassigned FLINK-1731:
---

Assignee: Alexander Alexandrov

 Add kMeans clustering algorithm to machine learning library
 ---

 Key: FLINK-1731
 URL: https://issues.apache.org/jira/browse/FLINK-1731
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Alexander Alexandrov
  Labels: ML

 The Flink repository already contains a kMeans implementation but it is not 
 yet ported to the machine learning library. I assume that only the used data 
 types have to be adapted and then it can be more or less directly moved to 
 flink-ml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1829) Conflicting Jackson version in the Flink POMs

2015-04-13 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492912#comment-14492912
 ] 

Alexander Alexandrov commented on FLINK-1829:
-

The client project includes (with provided scope) flink-scala, flink-java, and 
flink-clients. Here is the dependency tree for the problematic `emma-examples` 
project:

{{{
[INFO] 
[INFO] Building emma-sketchbook 1.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-dependency-plugin:2.1:tree (default-cli) @ emma-sketchbook ---
[INFO] eu.stratosphere:emma-sketchbook:jar:1.0-SNAPSHOT
[INFO] +- org.scala-lang:scala-library:jar:2.11.4:compile
[INFO] +- org.scala-lang:scala-reflect:jar:2.11.4:compile
[INFO] +- org.scala-lang:scala-compiler:jar:2.11.4:compile
[INFO] |  +- org.scala-lang.modules:scala-xml_2.11:jar:1.0.2:compile
[INFO] |  \- 
org.scala-lang.modules:scala-parser-combinators_2.11:jar:1.0.2:compile
[INFO] +- eu.stratosphere:emma-language:jar:1.0-SNAPSHOT:compile
[INFO] |  +- com.assembla.scala-incubator:graph-core_2.11:jar:1.9.0:compile
[INFO] |  +- eu.stratosphere:emma-backend:jar:1.0-SNAPSHOT:compile
[INFO] |  +- eu.stratosphere:emma-common:jar:1.0-SNAPSHOT:compile
[INFO] |  |  +- eu.stratosphere:emma-common-macros:jar:1.0-SNAPSHOT:compile
[INFO] |  |  +- net.sf.opencsv:opencsv:jar:2.3:compile
[INFO] |  |  \- 
com.typesafe.scala-logging:scala-logging-slf4j_2.11:jar:2.1.2:compile
[INFO] |  | \- 
com.typesafe.scala-logging:scala-logging-api_2.11:jar:2.1.2:compile
[INFO] |  +- org.apache.hadoop:hadoop-common:jar:2.2.0:compile
[INFO] |  |  +- org.apache.hadoop:hadoop-annotations:jar:2.2.0:compile
[INFO] |  |  |  \- jdk.tools:jdk.tools:jar:1.7:system
[INFO] |  |  +- commons-cli:commons-cli:jar:1.2:compile
[INFO] |  |  +- org.apache.commons:commons-math:jar:2.1:compile
[INFO] |  |  +- xmlenc:xmlenc:jar:0.52:compile
[INFO] |  |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] |  |  +- commons-codec:commons-codec:jar:1.4:compile
[INFO] |  |  +- commons-io:commons-io:jar:2.4:compile
[INFO] |  |  +- commons-net:commons-net:jar:2.2:compile
[INFO] |  |  +- javax.servlet:servlet-api:jar:2.5:compile
[INFO] |  |  +- org.mortbay.jetty:jetty:jar:6.1.26:compile
[INFO] |  |  +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile
[INFO] |  |  +- com.sun.jersey:jersey-core:jar:1.9:compile
[INFO] |  |  +- com.sun.jersey:jersey-json:jar:1.9:compile
[INFO] |  |  |  +- org.codehaus.jettison:jettison:jar:1.1:compile
[INFO] |  |  |  |  \- stax:stax-api:jar:1.0.1:compile
[INFO] |  |  |  +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile
[INFO] |  |  |  |  \- javax.xml.bind:jaxb-api:jar:2.2.2:compile
[INFO] |  |  |  | \- javax.activation:activation:jar:1.1:compile
[INFO] |  |  |  +- org.codehaus.jackson:jackson-jaxrs:jar:1.8.3:compile
[INFO] |  |  |  \- org.codehaus.jackson:jackson-xc:jar:1.8.3:compile
[INFO] |  |  +- com.sun.jersey:jersey-server:jar:1.9:compile
[INFO] |  |  |  \- asm:asm:jar:3.1:compile
[INFO] |  |  +- tomcat:jasper-compiler:jar:5.5.23:runtime
[INFO] |  |  +- commons-logging:commons-logging:jar:1.1.1:compile
[INFO] |  |  +- net.java.dev.jets3t:jets3t:jar:0.7.1:compile
[INFO] |  |  +- commons-lang:commons-lang:jar:2.5:compile
[INFO] |  |  +- commons-configuration:commons-configuration:jar:1.6:compile
[INFO] |  |  |  +- commons-collections:commons-collections:jar:3.2.1:compile
[INFO] |  |  |  +- commons-digester:commons-digester:jar:1.8:compile
[INFO] |  |  |  |  \- commons-beanutils:commons-beanutils:jar:1.7.0:compile
[INFO] |  |  |  \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile
[INFO] |  |  +- org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile
[INFO] |  |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:compile
[INFO] |  |  +- org.apache.avro:avro:jar:1.7.6:compile
[INFO] |  |  |  +- com.thoughtworks.paranamer:paranamer:jar:2.3:compile
[INFO] |  |  |  \- org.xerial.snappy:snappy-java:jar:1.1.1.6:compile
[INFO] |  |  +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
[INFO] |  |  +- org.apache.hadoop:hadoop-auth:jar:2.2.0:compile
[INFO] |  |  +- com.jcraft:jsch:jar:0.1.42:compile
[INFO] |  |  +- org.apache.zookeeper:zookeeper:jar:3.4.5:compile
[INFO] |  |  \- org.apache.commons:commons-compress:jar:1.4.1:compile
[INFO] |  | \- org.tukaani:xz:jar:1.0:compile
[INFO] |  \- org.apache.hadoop:hadoop-hdfs:jar:2.2.0:compile
[INFO] | +- commons-daemon:commons-daemon:jar:1.0.13:compile
[INFO] | +- javax.servlet.jsp:jsp-api:jar:2.1:compile
[INFO] | \- tomcat:jasper-runtime:jar:5.5.23:compile
[INFO] |\- commons-el:commons-el:jar:1.0:compile
[INFO] +- eu.stratosphere:emma-flink:jar:1.0-SNAPSHOT:compile
[INFO] +- eu.stratosphere:emma-common:test-jar:tests:1.0-SNAPSHOT:test
[INFO] +- eu.stratosphere:emma-language:test-jar:tests:1.0-SNAPSHOT:test
[INFO] 

[jira] [Commented] (FLINK-1829) Conflicting Jackson version in the Flink POMs

2015-04-13 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492965#comment-14492965
 ] 

Alexander Alexandrov commented on FLINK-1829:
-

Sorry, I thought that the two versions might share the same fqnames even though 
they differ in the groupId, which is not the case. I checked again, I think I 
found the error.

I actually have Spark 1.2.0 with {{provided}} scope next to the Flink deps. 
From Spark I get 

{noformat}
[INFO] |  +- org.json4s:json4s-jackson_2.11:jar:3.2.10:provided
[INFO] |  |  +- org.json4s:json4s-core_2.11:jar:3.2.10:provided
[INFO] |  |  |  +- org.json4s:json4s-ast_2.11:jar:3.2.10:provided
[INFO] |  |  |  +- com.thoughtworks.paranamer:paranamer:jar:2.6:provided
[INFO] |  |  |  \- org.scala-lang:scalap:jar:2.11.0:provided
[INFO] |  |  \- com.fasterxml.jackson.core:jackson-databind:jar:2.3.1:provided
[INFO] |  | +- 
com.fasterxml.jackson.core:jackson-annotations:jar:2.3.0:provided
[INFO] |  | \- com.fasterxml.jackson.core:jackson-core:jar:2.3.1:provided
{noformat}

which conflicts with Flink's version of 
{{com.fasterxml.jackson.core:jackson-*}}. I guess I'll have to fix the issue 
locally. 

You can close this with 'nofix', thanks for the hint.



 Conflicting Jackson version in the Flink POMs
 -

 Key: FLINK-1829
 URL: https://issues.apache.org/jira/browse/FLINK-1829
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 0.9
Reporter: Alexander Alexandrov
Assignee: Robert Metzger
 Fix For: 0.9


 The current POM setup transitively includes multiple conflicting versions of 
 the Jackson library over
 * {{com.amazonaws:aws-java-sdk}} (v. 2.1.1)
 * {{org.apache.avro:avro}} (v. 1.9.13)
 * {{org.apache.hbase:hbase-client}} (v. 1.8.8)
 When running jobs against a Flink local runtime embedded with:
 {code:xml}
 dependency
 groupIdorg.apache.flink/groupId
 artifactIdflink-scala/artifactId
 version${flink.version}/version
 scopeprovided/scope
 /dependency
 dependency
 groupIdorg.apache.flink/groupId
 artifactIdflink-java/artifactId
 version${flink.version}/version
 scopeprovided/scope
 /dependency
 dependency
 groupIdorg.apache.flink/groupId
 artifactIdflink-clients/artifactId
 version${flink.version}/version
 scopeprovided/scope
 /dependency
 {code}
 I get the following error:
 {noformat}
 15-04-04 15:52:04 ERROR exception during creation
 akka.actor.ActorInitializationException: exception during creation
   at akka.actor.ActorInitializationException$.apply(Actor.scala:164)
   at akka.actor.ActorCell.create(ActorCell.scala:596)
   at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
   at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
   at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at akka.util.Reflect$.instantiate(Reflect.scala:66)
   at akka.actor.ArgsReflectConstructor.produce(Props.scala:352)
   at akka.actor.Props.newActor(Props.scala:252)
   at akka.actor.ActorCell.newActor(ActorCell.scala:552)
   at akka.actor.ActorCell.create(ActorCell.scala:578)
   ... 9 more
 Caused by: java.lang.NoSuchMethodError: 
 com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z
   at 
 com.fasterxml.jackson.databind.ObjectMapper.init(ObjectMapper.java:445)
   at 
 com.fasterxml.jackson.databind.ObjectMapper.init(ObjectMapper.java:366)
   at 
 org.apache.flink.runtime.taskmanager.TaskManager.init(TaskManager.scala:134)
   ... 18 more
 {noformat}
 Fixing the Jackson version on the client side, e.g, with the following snippet
 {code:xml}
 dependency
 groupIdcom.fasterxml.jackson.core/groupId
 artifactIdjackson-core/artifactId
 version2.2.1/version
 scopeprovided/scope
 /dependency
 dependency
 groupIdcom.fasterxml.jackson.core/groupId
 

[jira] [Comment Edited] (FLINK-1829) Conflicting Jackson version in the Flink POMs

2015-04-13 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492912#comment-14492912
 ] 

Alexander Alexandrov edited comment on FLINK-1829 at 4/13/15 7:21 PM:
--

The client project includes (with provided scope) flink-scala, flink-java, and 
flink-clients. Here is the dependency tree for the problematic `emma-examples` 
project:

{noformat}
[INFO] 
[INFO] Building emma-sketchbook 1.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-dependency-plugin:2.1:tree (default-cli) @ emma-sketchbook ---
[INFO] eu.stratosphere:emma-sketchbook:jar:1.0-SNAPSHOT
[INFO] +- org.scala-lang:scala-library:jar:2.11.4:compile
[INFO] +- org.scala-lang:scala-reflect:jar:2.11.4:compile
[INFO] +- org.scala-lang:scala-compiler:jar:2.11.4:compile
[INFO] |  +- org.scala-lang.modules:scala-xml_2.11:jar:1.0.2:compile
[INFO] |  \- 
org.scala-lang.modules:scala-parser-combinators_2.11:jar:1.0.2:compile
[INFO] +- eu.stratosphere:emma-language:jar:1.0-SNAPSHOT:compile
[INFO] |  +- com.assembla.scala-incubator:graph-core_2.11:jar:1.9.0:compile
[INFO] |  +- eu.stratosphere:emma-backend:jar:1.0-SNAPSHOT:compile
[INFO] |  +- eu.stratosphere:emma-common:jar:1.0-SNAPSHOT:compile
[INFO] |  |  +- eu.stratosphere:emma-common-macros:jar:1.0-SNAPSHOT:compile
[INFO] |  |  +- net.sf.opencsv:opencsv:jar:2.3:compile
[INFO] |  |  \- 
com.typesafe.scala-logging:scala-logging-slf4j_2.11:jar:2.1.2:compile
[INFO] |  | \- 
com.typesafe.scala-logging:scala-logging-api_2.11:jar:2.1.2:compile
[INFO] |  +- org.apache.hadoop:hadoop-common:jar:2.2.0:compile
[INFO] |  |  +- org.apache.hadoop:hadoop-annotations:jar:2.2.0:compile
[INFO] |  |  |  \- jdk.tools:jdk.tools:jar:1.7:system
[INFO] |  |  +- commons-cli:commons-cli:jar:1.2:compile
[INFO] |  |  +- org.apache.commons:commons-math:jar:2.1:compile
[INFO] |  |  +- xmlenc:xmlenc:jar:0.52:compile
[INFO] |  |  +- commons-httpclient:commons-httpclient:jar:3.1:compile
[INFO] |  |  +- commons-codec:commons-codec:jar:1.4:compile
[INFO] |  |  +- commons-io:commons-io:jar:2.4:compile
[INFO] |  |  +- commons-net:commons-net:jar:2.2:compile
[INFO] |  |  +- javax.servlet:servlet-api:jar:2.5:compile
[INFO] |  |  +- org.mortbay.jetty:jetty:jar:6.1.26:compile
[INFO] |  |  +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile
[INFO] |  |  +- com.sun.jersey:jersey-core:jar:1.9:compile
[INFO] |  |  +- com.sun.jersey:jersey-json:jar:1.9:compile
[INFO] |  |  |  +- org.codehaus.jettison:jettison:jar:1.1:compile
[INFO] |  |  |  |  \- stax:stax-api:jar:1.0.1:compile
[INFO] |  |  |  +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile
[INFO] |  |  |  |  \- javax.xml.bind:jaxb-api:jar:2.2.2:compile
[INFO] |  |  |  | \- javax.activation:activation:jar:1.1:compile
[INFO] |  |  |  +- org.codehaus.jackson:jackson-jaxrs:jar:1.8.3:compile
[INFO] |  |  |  \- org.codehaus.jackson:jackson-xc:jar:1.8.3:compile
[INFO] |  |  +- com.sun.jersey:jersey-server:jar:1.9:compile
[INFO] |  |  |  \- asm:asm:jar:3.1:compile
[INFO] |  |  +- tomcat:jasper-compiler:jar:5.5.23:runtime
[INFO] |  |  +- commons-logging:commons-logging:jar:1.1.1:compile
[INFO] |  |  +- net.java.dev.jets3t:jets3t:jar:0.7.1:compile
[INFO] |  |  +- commons-lang:commons-lang:jar:2.5:compile
[INFO] |  |  +- commons-configuration:commons-configuration:jar:1.6:compile
[INFO] |  |  |  +- commons-collections:commons-collections:jar:3.2.1:compile
[INFO] |  |  |  +- commons-digester:commons-digester:jar:1.8:compile
[INFO] |  |  |  |  \- commons-beanutils:commons-beanutils:jar:1.7.0:compile
[INFO] |  |  |  \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile
[INFO] |  |  +- org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile
[INFO] |  |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:compile
[INFO] |  |  +- org.apache.avro:avro:jar:1.7.6:compile
[INFO] |  |  |  +- com.thoughtworks.paranamer:paranamer:jar:2.3:compile
[INFO] |  |  |  \- org.xerial.snappy:snappy-java:jar:1.1.1.6:compile
[INFO] |  |  +- com.google.protobuf:protobuf-java:jar:2.5.0:compile
[INFO] |  |  +- org.apache.hadoop:hadoop-auth:jar:2.2.0:compile
[INFO] |  |  +- com.jcraft:jsch:jar:0.1.42:compile
[INFO] |  |  +- org.apache.zookeeper:zookeeper:jar:3.4.5:compile
[INFO] |  |  \- org.apache.commons:commons-compress:jar:1.4.1:compile
[INFO] |  | \- org.tukaani:xz:jar:1.0:compile
[INFO] |  \- org.apache.hadoop:hadoop-hdfs:jar:2.2.0:compile
[INFO] | +- commons-daemon:commons-daemon:jar:1.0.13:compile
[INFO] | +- javax.servlet.jsp:jsp-api:jar:2.1:compile
[INFO] | \- tomcat:jasper-runtime:jar:5.5.23:compile
[INFO] |\- commons-el:commons-el:jar:1.0:compile
[INFO] +- eu.stratosphere:emma-flink:jar:1.0-SNAPSHOT:compile
[INFO] +- eu.stratosphere:emma-common:test-jar:tests:1.0-SNAPSHOT:test
[INFO] +- 

[jira] [Comment Edited] (FLINK-1829) Conflicting Jackson version in the Flink POMs

2015-04-13 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492932#comment-14492932
 ] 

Alexander Alexandrov edited comment on FLINK-1829 at 4/13/15 7:33 PM:
--

Correct, HBase is out of scope, but we still get both jackson 2.1.1 and 1.9.13 
through the {{com.amazonaws:aws-java-sdk}} and {{org.apache.avro:avro}} 
dependencies respectively.


was (Author: aalexandrov):
Correct, HBase is out of scope, but we still get both jackson 2.1.1 and 1.9.13 
through the {com.amazonaws:aws-java-sdk} and {org.apache.avro:avro} 
dependencies respectively.

 Conflicting Jackson version in the Flink POMs
 -

 Key: FLINK-1829
 URL: https://issues.apache.org/jira/browse/FLINK-1829
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 0.9
Reporter: Alexander Alexandrov
Assignee: Robert Metzger
 Fix For: 0.9


 The current POM setup transitively includes multiple conflicting versions of 
 the Jackson library over
 * {{com.amazonaws:aws-java-sdk}} (v. 2.1.1)
 * {{org.apache.avro:avro}} (v. 1.9.13)
 * {{org.apache.hbase:hbase-client}} (v. 1.8.8)
 When running jobs against a Flink local runtime embedded with:
 {code:xml}
 dependency
 groupIdorg.apache.flink/groupId
 artifactIdflink-scala/artifactId
 version${flink.version}/version
 scopeprovided/scope
 /dependency
 dependency
 groupIdorg.apache.flink/groupId
 artifactIdflink-java/artifactId
 version${flink.version}/version
 scopeprovided/scope
 /dependency
 dependency
 groupIdorg.apache.flink/groupId
 artifactIdflink-clients/artifactId
 version${flink.version}/version
 scopeprovided/scope
 /dependency
 {code}
 I get the following error:
 {noformat}
 15-04-04 15:52:04 ERROR exception during creation
 akka.actor.ActorInitializationException: exception during creation
   at akka.actor.ActorInitializationException$.apply(Actor.scala:164)
   at akka.actor.ActorCell.create(ActorCell.scala:596)
   at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
   at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
   at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at akka.util.Reflect$.instantiate(Reflect.scala:66)
   at akka.actor.ArgsReflectConstructor.produce(Props.scala:352)
   at akka.actor.Props.newActor(Props.scala:252)
   at akka.actor.ActorCell.newActor(ActorCell.scala:552)
   at akka.actor.ActorCell.create(ActorCell.scala:578)
   ... 9 more
 Caused by: java.lang.NoSuchMethodError: 
 com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z
   at 
 com.fasterxml.jackson.databind.ObjectMapper.init(ObjectMapper.java:445)
   at 
 com.fasterxml.jackson.databind.ObjectMapper.init(ObjectMapper.java:366)
   at 
 org.apache.flink.runtime.taskmanager.TaskManager.init(TaskManager.scala:134)
   ... 18 more
 {noformat}
 Fixing the Jackson version on the client side, e.g, with the following snippet
 {code:xml}
 dependency
 groupIdcom.fasterxml.jackson.core/groupId
 artifactIdjackson-core/artifactId
 version2.2.1/version
 scopeprovided/scope
 /dependency
 dependency
 groupIdcom.fasterxml.jackson.core/groupId
 artifactIdjackson-databind/artifactId
 version2.2.1/version
 scopeprovided/scope
 /dependency
 dependency
 groupIdcom.fasterxml.jackson.core/groupId
 artifactIdjackson-annotations/artifactId
 version2.2.1/version
 scopeprovided/scope
 /dependency
 {code}
 solves the problem, but I guess it will be better if we can stick with one 
 version in the build artifacts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1829) Conflicting Jackson version in the Flink POMs

2015-04-13 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492938#comment-14492938
 ] 

Alexander Alexandrov commented on FLINK-1829:
-

I am actually running from IntelliJ using a TestCase that starts the job. I'll 
post the classpath with and without the {{jackson}} enforcement on the client 
POM in a minute.

 Conflicting Jackson version in the Flink POMs
 -

 Key: FLINK-1829
 URL: https://issues.apache.org/jira/browse/FLINK-1829
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 0.9
Reporter: Alexander Alexandrov
Assignee: Robert Metzger
 Fix For: 0.9


 The current POM setup transitively includes multiple conflicting versions of 
 the Jackson library over
 * {{com.amazonaws:aws-java-sdk}} (v. 2.1.1)
 * {{org.apache.avro:avro}} (v. 1.9.13)
 * {{org.apache.hbase:hbase-client}} (v. 1.8.8)
 When running jobs against a Flink local runtime embedded with:
 {code:xml}
 dependency
 groupIdorg.apache.flink/groupId
 artifactIdflink-scala/artifactId
 version${flink.version}/version
 scopeprovided/scope
 /dependency
 dependency
 groupIdorg.apache.flink/groupId
 artifactIdflink-java/artifactId
 version${flink.version}/version
 scopeprovided/scope
 /dependency
 dependency
 groupIdorg.apache.flink/groupId
 artifactIdflink-clients/artifactId
 version${flink.version}/version
 scopeprovided/scope
 /dependency
 {code}
 I get the following error:
 {noformat}
 15-04-04 15:52:04 ERROR exception during creation
 akka.actor.ActorInitializationException: exception during creation
   at akka.actor.ActorInitializationException$.apply(Actor.scala:164)
   at akka.actor.ActorCell.create(ActorCell.scala:596)
   at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
   at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
   at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at akka.util.Reflect$.instantiate(Reflect.scala:66)
   at akka.actor.ArgsReflectConstructor.produce(Props.scala:352)
   at akka.actor.Props.newActor(Props.scala:252)
   at akka.actor.ActorCell.newActor(ActorCell.scala:552)
   at akka.actor.ActorCell.create(ActorCell.scala:578)
   ... 9 more
 Caused by: java.lang.NoSuchMethodError: 
 com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z
   at 
 com.fasterxml.jackson.databind.ObjectMapper.init(ObjectMapper.java:445)
   at 
 com.fasterxml.jackson.databind.ObjectMapper.init(ObjectMapper.java:366)
   at 
 org.apache.flink.runtime.taskmanager.TaskManager.init(TaskManager.scala:134)
   ... 18 more
 {noformat}
 Fixing the Jackson version on the client side, e.g, with the following snippet
 {code:xml}
 dependency
 groupIdcom.fasterxml.jackson.core/groupId
 artifactIdjackson-core/artifactId
 version2.2.1/version
 scopeprovided/scope
 /dependency
 dependency
 groupIdcom.fasterxml.jackson.core/groupId
 artifactIdjackson-databind/artifactId
 version2.2.1/version
 scopeprovided/scope
 /dependency
 dependency
 groupIdcom.fasterxml.jackson.core/groupId
 artifactIdjackson-annotations/artifactId
 version2.2.1/version
 scopeprovided/scope
 /dependency
 {code}
 solves the problem, but I guess it will be better if we can stick with one 
 version in the build artifacts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1829) Conflicting Jackson version in the Flink POMs

2015-04-13 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492932#comment-14492932
 ] 

Alexander Alexandrov commented on FLINK-1829:
-

Correct, HBase is out of scope, but we still get both jackson 2.1.1 and 1.9.13 
through the {com.amazonaws:aws-java-sdk} and {org.apache.avro:avro} 
dependencies respectively.

 Conflicting Jackson version in the Flink POMs
 -

 Key: FLINK-1829
 URL: https://issues.apache.org/jira/browse/FLINK-1829
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 0.9
Reporter: Alexander Alexandrov
Assignee: Robert Metzger
 Fix For: 0.9


 The current POM setup transitively includes multiple conflicting versions of 
 the Jackson library over
 * {{com.amazonaws:aws-java-sdk}} (v. 2.1.1)
 * {{org.apache.avro:avro}} (v. 1.9.13)
 * {{org.apache.hbase:hbase-client}} (v. 1.8.8)
 When running jobs against a Flink local runtime embedded with:
 {code:xml}
 dependency
 groupIdorg.apache.flink/groupId
 artifactIdflink-scala/artifactId
 version${flink.version}/version
 scopeprovided/scope
 /dependency
 dependency
 groupIdorg.apache.flink/groupId
 artifactIdflink-java/artifactId
 version${flink.version}/version
 scopeprovided/scope
 /dependency
 dependency
 groupIdorg.apache.flink/groupId
 artifactIdflink-clients/artifactId
 version${flink.version}/version
 scopeprovided/scope
 /dependency
 {code}
 I get the following error:
 {noformat}
 15-04-04 15:52:04 ERROR exception during creation
 akka.actor.ActorInitializationException: exception during creation
   at akka.actor.ActorInitializationException$.apply(Actor.scala:164)
   at akka.actor.ActorCell.create(ActorCell.scala:596)
   at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
   at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
   at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at akka.util.Reflect$.instantiate(Reflect.scala:66)
   at akka.actor.ArgsReflectConstructor.produce(Props.scala:352)
   at akka.actor.Props.newActor(Props.scala:252)
   at akka.actor.ActorCell.newActor(ActorCell.scala:552)
   at akka.actor.ActorCell.create(ActorCell.scala:578)
   ... 9 more
 Caused by: java.lang.NoSuchMethodError: 
 com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z
   at 
 com.fasterxml.jackson.databind.ObjectMapper.init(ObjectMapper.java:445)
   at 
 com.fasterxml.jackson.databind.ObjectMapper.init(ObjectMapper.java:366)
   at 
 org.apache.flink.runtime.taskmanager.TaskManager.init(TaskManager.scala:134)
   ... 18 more
 {noformat}
 Fixing the Jackson version on the client side, e.g, with the following snippet
 {code:xml}
 dependency
 groupIdcom.fasterxml.jackson.core/groupId
 artifactIdjackson-core/artifactId
 version2.2.1/version
 scopeprovided/scope
 /dependency
 dependency
 groupIdcom.fasterxml.jackson.core/groupId
 artifactIdjackson-databind/artifactId
 version2.2.1/version
 scopeprovided/scope
 /dependency
 dependency
 groupIdcom.fasterxml.jackson.core/groupId
 artifactIdjackson-annotations/artifactId
 version2.2.1/version
 scopeprovided/scope
 /dependency
 {code}
 solves the problem, but I guess it will be better if we can stick with one 
 version in the build artifacts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1829) Conflicting Jackson version in the Flink POMs

2015-04-04 Thread Alexander Alexandrov (JIRA)
Alexander Alexandrov created FLINK-1829:
---

 Summary: Conflicting Jackson version in the Flink POMs
 Key: FLINK-1829
 URL: https://issues.apache.org/jira/browse/FLINK-1829
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 0.9
Reporter: Alexander Alexandrov
 Fix For: 0.9


The current POM setup transitively includes multiple conflicting versions of 
the Jackson library over

* {{com.amazonaws:aws-java-sdk}} (v. 2.1.1)
* {{org.apache.avro:avro}} (v. 1.9.13)
* {{org.apache.hbase:hbase-client}} (v. 1.8.8)

When running jobs against a Flink local runtime embedded with:

{code:xml}
dependency
groupIdorg.apache.flink/groupId
artifactIdflink-scala/artifactId
version${flink.version}/version
scopeprovided/scope
/dependency
dependency
groupIdorg.apache.flink/groupId
artifactIdflink-java/artifactId
version${flink.version}/version
scopeprovided/scope
/dependency
dependency
groupIdorg.apache.flink/groupId
artifactIdflink-clients/artifactId
version${flink.version}/version
scopeprovided/scope
/dependency
{code}

I get the following error:

{noformat}
15-04-04 15:52:04 ERROR exception during creation
akka.actor.ActorInitializationException: exception during creation
at akka.actor.ActorInitializationException$.apply(Actor.scala:164)
at akka.actor.ActorCell.create(ActorCell.scala:596)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at akka.util.Reflect$.instantiate(Reflect.scala:66)
at akka.actor.ArgsReflectConstructor.produce(Props.scala:352)
at akka.actor.Props.newActor(Props.scala:252)
at akka.actor.ActorCell.newActor(ActorCell.scala:552)
at akka.actor.ActorCell.create(ActorCell.scala:578)
... 9 more
Caused by: java.lang.NoSuchMethodError: 
com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z
at 
com.fasterxml.jackson.databind.ObjectMapper.init(ObjectMapper.java:445)
at 
com.fasterxml.jackson.databind.ObjectMapper.init(ObjectMapper.java:366)
at 
org.apache.flink.runtime.taskmanager.TaskManager.init(TaskManager.scala:134)
... 18 more
{noformat}

Fixing the Jackson version on the client side, e.g, with the following snippet

{code:xml}
dependency
groupIdcom.fasterxml.jackson.core/groupId
artifactIdjackson-core/artifactId
version2.2.1/version
scopeprovided/scope
/dependency
dependency
groupIdcom.fasterxml.jackson.core/groupId
artifactIdjackson-databind/artifactId
version2.2.1/version
scopeprovided/scope
/dependency
dependency
groupIdcom.fasterxml.jackson.core/groupId
artifactIdjackson-annotations/artifactId
version2.2.1/version
scopeprovided/scope
/dependency
{code}

solves the problem, but I guess it will be better if we can stick with one 
version in the build artifacts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-967) Make intermediate results a first-class citizen in the JobGraph

2015-04-02 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392980#comment-14392980
 ] 

Alexander Alexandrov commented on FLINK-967:


[~StephanEwen] what is the status on this?

 Make intermediate results a first-class citizen in the JobGraph
 ---

 Key: FLINK-967
 URL: https://issues.apache.org/jira/browse/FLINK-967
 Project: Flink
  Issue Type: New Feature
  Components: JobManager, TaskManager
Affects Versions: 0.6-incubating
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


 In order to add incremental plan rollout to the system, we need to make 
 intermediate results a first-class citizen in the job graph and scheduler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-1789) Allow adding of URLs to the usercode class loader

2015-04-01 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390800#comment-14390800
 ] 

Alexander Alexandrov edited comment on FLINK-1789 at 4/1/15 3:29 PM:
-

The problem with that approach is that the code loaded by the regular JVM class 
loader cannot refer to job specific types (which can be accessed only at the  
UserCodeClassLoader level). Unfortunately, this is the case if we use the 
classpath entry to generate the dataflows dynamically at runtime.

My original hack was to therefore hardcode a filesystem path next to the list 
of jars when initializing the BlobManager entry, and I wanted to open an issue 
which makes this list parameterizable via an additional ExecutionEnvironment 
argument (this is basically the only main feature which prohibits the use of 
Emma with off-the-shelf Flink).

This, of course, would require that the folders are shared (e.g. via NFS) 
between client, master and workers. I think what made Stephan so excited is the 
idea of using the same URL mechanism in order to ship the code to all dependent 
parties (most probably by running a dedicated HTTP or FTP server on the client).


was (Author: aalexandrov):
The problem with that approach is that the code loaded by the regular JVM class 
loader cannot refer to job specific types (which can be accessed only at the  
UserCodeClassLoader level). Unfortunately, this is the case if we use the 
classpath entry to generate the dataflows dynamically at runtime.

My original hack was to therefore hardcode a filesystem path next to the list 
of jars when initializing the BlobManager, and I wanted to open an issue which 
makes this configurable when initializing the execution environment (this is 
basically the only main feature which prohibits the use of Emma with 
off-the-shelf Flink).

This, of course, would require that the folders are shared (e.g. via NFS) 
between client, master and workers. I think what made Stephan so excited is the 
idea of using the same URL mechanism in order to ship the code to all dependent 
parties (most probably by running a dedicated HTTP or FTP server on the client).

 Allow adding of URLs to the usercode class loader
 -

 Key: FLINK-1789
 URL: https://issues.apache.org/jira/browse/FLINK-1789
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime
Reporter: Timo Walther
Assignee: Timo Walther
Priority: Minor

 Currently, there is no option to add customs classpath URLs to the 
 FlinkUserCodeClassLoader. JARs always need to be shipped to the cluster even 
 if they are already present on all nodes.
 It would be great if RemoteEnvironment also accepts valid classpaths URLs and 
 forwards them to BlobLibraryCacheManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1741) Add Jaccard Similarity Metric Example

2015-03-18 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367867#comment-14367867
 ] 

Alexander Alexandrov commented on FLINK-1741:
-

I think we had some similar code for this in one of the ETL jobs in the IMPRO 
repository.

 Add Jaccard Similarity Metric Example
 -

 Key: FLINK-1741
 URL: https://issues.apache.org/jira/browse/FLINK-1741
 Project: Flink
  Issue Type: Task
  Components: Gelly
Affects Versions: 0.9
Reporter: Andra Lungu
Assignee: Andra Lungu

 http://www.inside-r.org/packages/cran/igraph/docs/similarity



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1713) Add support for blocking data exchange in closed loop iterations

2015-03-17 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366205#comment-14366205
 ] 

Alexander Alexandrov commented on FLINK-1713:
-

I think he means Flink's native iteration support, i.e. execution plans which 
are wrapped in an IterationHead and IterationTail operators.

 Add support for blocking data exchange in closed loop iterations
 

 Key: FLINK-1713
 URL: https://issues.apache.org/jira/browse/FLINK-1713
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Priority: Minor

 The way that blocking intermediate results are currently managed prohibits 
 them from being used inside of closed loops.
 A blocking result has to be fully produced before its receivers are deployed 
 and there is no notion of single iterations etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1613) Cannost submit to remote ExecutionEnvironment from IDE

2015-02-26 Thread Alexander Alexandrov (JIRA)
Alexander Alexandrov created FLINK-1613:
---

 Summary: Cannost submit to remote ExecutionEnvironment from IDE
 Key: FLINK-1613
 URL: https://issues.apache.org/jira/browse/FLINK-1613
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: 0.8.1
 Environment: * Ubuntu Linux 14.04
* Flink 0.9-SNAPSHOT or 0.8.1 running in standalone mode on localhost
Reporter: Alexander Alexandrov
 Fix For: 0.9, 0.8.2


I am reporting this as [~rmetzler] mentioned offline that it was working in the 
past.

At the moment it is not possible to submit jobs directly from the IDE. Both the 
Java and the Scala quickstart guides fail on both 0.8.1 and 0.9-SNAPSHOT with 
ClassNotFoundException exceptions.

To reproduce the error, run the quickstart scripts and change the 
ExecutionEnvironment initialization:

{code:java}
env = ExecutionEnvironment.createRemoteEnvironment(localhost, 6123)
{code}


This is the cause for Java:

{noformat}
Caused by: java.lang.ClassNotFoundException: 
org.myorg.quickstart.WordCount$LineSplitter
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at 
org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:54)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at 
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:274)
at 
org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:236)
at 
org.apache.flink.runtime.operators.util.TaskConfig.getStubWrapper(TaskConfig.java:281)
{noformat}

This is for Scala:

{noformat}
java.lang.ClassNotFoundException: org.myorg.quickstart.WordCount$$anon$2$$anon$1
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at 
org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:54)
at 
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at 
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:274)
at 
org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:236)
at 
org.apache.flink.api.java.typeutils.runtime.RuntimeSerializerFactory.readParametersFromConfig(RuntimeSerializerFactory.java:76)
at 
org.apache.flink.runtime.operators.util.TaskConfig.getTypeSerializerFactory(TaskConfig.java:1084)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1464) Added ResultTypeQueryable interface to TypeSerializerInputFormat.

2015-01-30 Thread Alexander Alexandrov (JIRA)
Alexander Alexandrov created FLINK-1464:
---

 Summary: Added ResultTypeQueryable interface to 
TypeSerializerInputFormat.
 Key: FLINK-1464
 URL: https://issues.apache.org/jira/browse/FLINK-1464
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime, Optimizer
Affects Versions: 0.8, 0.9, 0.8.1
Reporter: Alexander Alexandrov
Assignee: Alexander Alexandrov
Priority: Minor
 Fix For: 0.9, 0.8.1


It is currently impossible to use the {{TypeSerializerInputFormat}} with 
generic Tuple types.

For example, [this example 
gist|https://gist.github.com/aalexandrov/90bf21f66bf604676f37] fails with a

{quote}
Exception in thread main org.apache.flink.api.common.InvalidProgramException: 
The type returned by the input format could not be automatically determined. 
Please specify the TypeInformation of the produced type explicitly.
at 
org.apache.flink.api.java.ExecutionEnvironment.readFile(ExecutionEnvironment.java:341)
at SerializedFormatExample$.main(SerializedFormatExample.scala:48)
at SerializedFormatExample.main(SerializedFormatExample.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
{quote}

exaception. 

To fix the issue, I changed the constructor to take a {{TypeInformationT}} 
instad of a {{TypeSerializerT}} argument. If this is indeed a bug, I think 
that this is a good solution. 

Unfortunately the fix breaks the API. Feel free to change it if you find a more 
elegant solution compatible with the 0.8 branch.

The suggested fix can be found in the GitHub 
[PR#349|https://github.com/apache/flink/pull/349].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1422) Missing usage example for withParameters

2015-01-29 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14296986#comment-14296986
 ] 

Alexander Alexandrov commented on FLINK-1422:
-

+1 from me as well.

I recently find out that one can also use constructor arguments to pass UDF
parameters from the driver if the UDF is defined as a static final class
(i.e. the UDF instance can be serialized).

I think that this should be covered here as well for the sake of
completeness.




 Missing usage example for withParameters
 --

 Key: FLINK-1422
 URL: https://issues.apache.org/jira/browse/FLINK-1422
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 0.8
Reporter: Alexander Alexandrov
Assignee: Chesnay Schepler
Priority: Trivial
 Fix For: 0.8.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am struggling to find a usage example of the withParameters method in the 
 documentation. At the moment I only see this note:
 {quote}
 Note: As the content of broadcast variables is kept in-memory on each node, 
 it should not become too large. For simpler things like scalar values you can 
 simply make parameters part of the closure of a function, or use the 
 withParameters(...) method to pass in a configuration.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1427) Configuration through environment variables

2015-01-21 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285744#comment-14285744
 ] 

Alexander Alexandrov commented on FLINK-1427:
-

I am not too fond of this idea.

I like having one clean entry point to the configuration of the system and a 
clear section documenting how I can change that configuration. One of the main 
issues I always have when dealing with Spark / Hadoop is deciding where and how 
to configure what.

We actually had an environment based approach in the project in past, but 
removed it in order to simplify the configuration options and have them all in 
one place.

I suggest to think again about this one, as it will also require at least

(1) making sure that the documentation explains that there are multiple 
equivalent setup options;
(2) ensure that these setup options are actually equivalent.

 Configuration through environment variables
 ---

 Key: FLINK-1427
 URL: https://issues.apache.org/jira/browse/FLINK-1427
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
 Environment: Deployment
Reporter: Max Michels
Priority: Minor
  Labels: configuration, deployment

 Like Hadoop or Spark, etc. Flink should support configuration via shell 
 environment variables. In cluster setups, this makes things a lot easier 
 because writing config files can be omitted. Many automation tools (e.g. 
 Google's bdutil) use (or abuse) this feature.
 For example, to set up the task manager heap size, we would run `export 
 FLINK_TASKMANAGER_HEAP=4096` before starting the task manager on a node to 
 set the heap memory size to 4096MB.
 Environment variables should overwrite the regular config entries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1426) JobManager AJAX requests sometimes fail

2015-01-21 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285683#comment-14285683
 ] 

Alexander Alexandrov commented on FLINK-1426:
-

He is, he has just not opened a JIRA issue about that yet. A preview should be 
pushed to his repo in the next days:

https://github.com/MatzeDS/incubator-flink

We will announce it when it's done.

Regards,
Alexander

 JobManager AJAX requests sometimes fail
 ---

 Key: FLINK-1426
 URL: https://issues.apache.org/jira/browse/FLINK-1426
 Project: Flink
  Issue Type: Bug
  Components: JobManager, Webfrontend
Reporter: Robert Metzger

 It seems that the JobManager sometimes (I think when accessing it the first 
 time) does not show the number of TMs / slots.
 A simple workaround is re-loading it, but still, users are complaining about 
 it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1426) JobManager AJAX requests sometimes fail

2015-01-21 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285657#comment-14285657
 ] 

Alexander Alexandrov commented on FLINK-1426:
-

Just to restate - I have a student who is rewriting this as part of his 
bachelor thesis. 

The goal here is to make a PR for a unified version of the JobManager and the 
SubmissionTool in the next 4-6 weeks. We should avoid investing time solving 
bugs in code that might change in the near future. I will ping him and let him 
introduce yourself here.

 JobManager AJAX requests sometimes fail
 ---

 Key: FLINK-1426
 URL: https://issues.apache.org/jira/browse/FLINK-1426
 Project: Flink
  Issue Type: Bug
  Components: JobManager, Webfrontend
Reporter: Robert Metzger

 It seems that the JobManager sometimes (I think when accessing it the first 
 time) does not show the number of TMs / slots.
 A simple workaround is re-loading it, but still, users are complaining about 
 it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1422) Missing usage example for withParameters

2015-01-20 Thread Alexander Alexandrov (JIRA)
Alexander Alexandrov created FLINK-1422:
---

 Summary: Missing usage example for withParameters
 Key: FLINK-1422
 URL: https://issues.apache.org/jira/browse/FLINK-1422
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 0.8
Reporter: Alexander Alexandrov
Priority: Trivial
 Fix For: 0.8.1


I am struggling to find a usage example of the withParameters method in the 
documentation. At the moment I only see this note:

{quote}
Note: As the content of broadcast variables is kept in-memory on each node, it 
should not become too large. For simpler things like scalar values you can 
simply make parameters part of the closure of a function, or use the 
withParameters(...) method to pass in a configuration.
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)