[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270061#comment-15270061
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61986359
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+import org.apache.flink.util.Preconditions;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   Preconditions.checkArgument(maxIterations > 0, "The number of 
maximum iteration should be greater than 0.");
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   this(maxIterations, hitsParameter);
+   Preconditions.checkArgument(numberOfVertices > 0, "The number 
of vertices in graph should be greater than 0.");
+   this.numberOfVertices = numberOfVertices;
+   }
+
+   @Override
+   public DataSet> run(Graph netGraph) 
throws Exception {
+   if (this.numberOfVertices == 

[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-03 Thread gallenvara
Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61986359
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+import org.apache.flink.util.Preconditions;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   Preconditions.checkArgument(maxIterations > 0, "The number of 
maximum iteration should be greater than 0.");
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   this(maxIterations, hitsParameter);
+   Preconditions.checkArgument(numberOfVertices > 0, "The number 
of vertices in graph should be greater than 0.");
+   this.numberOfVertices = numberOfVertices;
+   }
+
+   @Override
+   public DataSet> run(Graph netGraph) 
throws Exception {
+   if (this.numberOfVertices == 0) {
+   this.numberOfVertices = netGraph.numberOfVertices();
--- End diff --

I'm sorry that i forgot to remove it. `Sum normalization` does not need the 
number of vertices, only necessary for `z-core 

[jira] [Commented] (FLINK-3857) Add reconnect attempt to Elasticsearch host

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270033#comment-15270033
 ] 

ASF GitHub Bot commented on FLINK-3857:
---

GitHub user sbcd90 opened a pull request:

https://github.com/apache/flink/pull/1962

[FLINK-3857][Streaming Connectors]Add reconnect attempt to Elasticsearch 
host

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-3857] Add 
reconnect attempt to Elasticsearch host")

- [ ] Documentation
  - Documentation added based on the changes made.

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sbcd90/flink elasticSearchRetryIssue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1962.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1962


commit 62990c984f0d2eca3ba89ed9c2d22c469f16b136
Author: Subhobrata Dey 
Date:   2016-05-04T02:16:35Z

[FLINK-3857][Streaming Connectors]Add reconnect attempt to Elasticsearch 
host




> Add reconnect attempt to Elasticsearch host
> ---
>
> Key: FLINK-3857
> URL: https://issues.apache.org/jira/browse/FLINK-3857
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Affects Versions: 1.1.0, 1.0.2
>Reporter: Fabian Hueske
>Assignee: Subhobrata Dey
>
> Currently, the connection to the Elasticsearch host is opened in 
> {{ElasticsearchSink.open()}}. In case the connection is lost (maybe due to a 
> changed DNS entry), the sink fails.
> I propose to catch the Exception for lost connections in the {{invoke()}} 
> method and try to re-open the connection for a configurable number of times 
> with a certain delay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3857][Streaming Connectors]Add reconnec...

2016-05-03 Thread sbcd90
GitHub user sbcd90 opened a pull request:

https://github.com/apache/flink/pull/1962

[FLINK-3857][Streaming Connectors]Add reconnect attempt to Elasticsearch 
host

- [ ] General
  - The pull request references the related JIRA issue ("[FLINK-3857] Add 
reconnect attempt to Elasticsearch host")

- [ ] Documentation
  - Documentation added based on the changes made.

- [ ] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sbcd90/flink elasticSearchRetryIssue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1962.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1962


commit 62990c984f0d2eca3ba89ed9c2d22c469f16b136
Author: Subhobrata Dey 
Date:   2016-05-04T02:16:35Z

[FLINK-3857][Streaming Connectors]Add reconnect attempt to Elasticsearch 
host




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-3866) StringArraySerializer claims type is immutable; shouldn't

2016-05-03 Thread Tatu Saloranta (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tatu Saloranta updated FLINK-3866:
--
Description: 
Looking at default `TypeSerializer` instances I noticed what looks like a minor 
flaw, unless I am missing something.
Whereas all other array serializers indicate that type is not immutable (since 
in Java, arrays are not immutable), `StringArraySerializer` has:

```
@Override
public boolean isImmutableType() {
return true;
}
```

and I think it should instead return `false`. I could create a PR, but seems 
like a small enough thing that issue report makes more sense.
I tried looking for deps to see if there's a test for this, but couldn't find 
one; otherwise could submit a test fix.




  was:
Looking at default `TypeSerializer` instances I noticed what looks like a minor 
flaw, unless I am missing something.
Whereas all other array serializers indicate that type is not immutable (since 
in Java, arrays are not immutable), `StringArraySerializer` has:

```
@Override
public boolean isImmutableType() {
return true;
}
```

and I think it should instead return `false`.





> StringArraySerializer claims type is immutable; shouldn't
> -
>
> Key: FLINK-3866
> URL: https://issues.apache.org/jira/browse/FLINK-3866
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.0.3
>Reporter: Tatu Saloranta
>Priority: Minor
>
> Looking at default `TypeSerializer` instances I noticed what looks like a 
> minor flaw, unless I am missing something.
> Whereas all other array serializers indicate that type is not immutable 
> (since in Java, arrays are not immutable), `StringArraySerializer` has:
> ```
>   @Override
>   public boolean isImmutableType() {
>   return true;
>   }
> ```
> and I think it should instead return `false`. I could create a PR, but seems 
> like a small enough thing that issue report makes more sense.
> I tried looking for deps to see if there's a test for this, but couldn't find 
> one; otherwise could submit a test fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3866) StringArraySerializer claims type is immutable; shouldn't

2016-05-03 Thread Tatu Saloranta (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tatu Saloranta updated FLINK-3866:
--
Description: 
Looking at default `TypeSerializer` instances I noticed what looks like a minor 
flaw, unless I am missing something.
Whereas all other array serializers indicate that type is not immutable (since 
in Java, arrays are not immutable), `StringArraySerializer` has:

```
@Override
public boolean isImmutableType() {
return true;
}
```

and I think it should instead return `false`.




  was:
Jackson version in use (2.4.2) is rather old (and not even the latest patch 
from minor version), so it'd be make sense to upgrade to bit newer. Latest 
would be 2.7.4, but at first I propose going to 2.5.5.

All tests pass, but if there are issues I'd be happy to help; I'm author of 
Jackson project.



> StringArraySerializer claims type is immutable; shouldn't
> -
>
> Key: FLINK-3866
> URL: https://issues.apache.org/jira/browse/FLINK-3866
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.0.3
>Reporter: Tatu Saloranta
>Priority: Minor
>
> Looking at default `TypeSerializer` instances I noticed what looks like a 
> minor flaw, unless I am missing something.
> Whereas all other array serializers indicate that type is not immutable 
> (since in Java, arrays are not immutable), `StringArraySerializer` has:
> ```
>   @Override
>   public boolean isImmutableType() {
>   return true;
>   }
> ```
> and I think it should instead return `false`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3866) StringArraySerializer claims type is immutable; shouldn't

2016-05-03 Thread Tatu Saloranta (JIRA)
Tatu Saloranta created FLINK-3866:
-

 Summary: StringArraySerializer claims type is immutable; shouldn't
 Key: FLINK-3866
 URL: https://issues.apache.org/jira/browse/FLINK-3866
 Project: Flink
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.0.3
Reporter: Tatu Saloranta
Priority: Minor


Jackson version in use (2.4.2) is rather old (and not even the latest patch 
from minor version), so it'd be make sense to upgrade to bit newer. Latest 
would be 2.7.4, but at first I propose going to 2.5.5.

All tests pass, but if there are issues I'd be happy to help; I'm author of 
Jackson project.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3855) Upgrade Jackson version

2016-05-03 Thread Tatu Saloranta (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269725#comment-15269725
 ] 

Tatu Saloranta commented on FLINK-3855:
---

Also: if I am not mistaken, ElasticSearch2 connector does not actually use 
jackson, just has a dependency, which looks like something that could be 
removed (unless there's something wrong with transitive dependencies)?

Kafka connector has a small dependency (convenience binding).


> Upgrade Jackson version
> ---
>
> Key: FLINK-3855
> URL: https://issues.apache.org/jira/browse/FLINK-3855
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.0.3
>Reporter: Tatu Saloranta
>Priority: Minor
>
> Jackson version in use (2.4.2) is rather old (and not even the latest patch 
> from minor version), so it'd be make sense to upgrade to bit newer. Latest 
> would be 2.7.4, but at first I propose going to 2.5.5.
> All tests pass, but if there are issues I'd be happy to help; I'm author of 
> Jackson project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [Flink-3691] extend avroinputformat to support...

2016-05-03 Thread gna-phetsarath
Github user gna-phetsarath commented on a diff in the pull request:

https://github.com/apache/flink/pull/1920#discussion_r61957444
  
--- Diff: 
flink-batch-connectors/flink-avro/src/test/java/org/apache/flink/api/io/avro/AvroRecordInputFormatTest.java
 ---
@@ -289,6 +290,119 @@ public void testDeserializeToSpecificType() throws 
IOException {
}
}
 
+   /**
+* Test if the AvroInputFormat is able to properly read data from an 
Avro
+* file as a GenericRecord.
+* 
+* @throws IOException,
+* if there is an exception
+*/
+   @SuppressWarnings("unchecked")
+   @Test
+   public void testDeserialisationGenericRecord() throws IOException {
+   Configuration parameters = new Configuration();
+
+   AvroInputFormat format = new 
AvroInputFormat(new Path(testFile.getAbsolutePath()),
+   GenericRecord.class);
+   try {
+   format.configure(parameters);
+   FileInputSplit[] splits = format.createInputSplits(1);
+   assertEquals(splits.length, 1);
+   format.open(splits[0]);
+
+   GenericRecord u = format.nextRecord(null);
--- End diff --

From ```GenericData.class```, if you pass a null, a new instance of the 
```Record``` will be created:
```
  /**
   * Called to create new record instances. Subclasses may override to use a
   * different record implementation. The returned instance must conform to 
the
   * schema provided. If the old object contains fields not present in the
   * schema, they should either be removed from the old object, or it should
   * create a new instance that conforms to the schema. By default, this 
returns
   * a {@link GenericData.Record}.
   */
  public Object newRecord(Object old, Schema schema) {
if (old instanceof IndexedRecord) {
  IndexedRecord record = (IndexedRecord)old;
  if (record.getSchema() == schema)
return record;
}
return new GenericData.Record(schema);
  }
```
So, I think it's valid.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3865) ExecutionConfig NullPointerException with second execution

2016-05-03 Thread Greg Hogan (JIRA)
Greg Hogan created FLINK-3865:
-

 Summary: ExecutionConfig NullPointerException with second execution
 Key: FLINK-3865
 URL: https://issues.apache.org/jira/browse/FLINK-3865
 Project: Flink
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
Reporter: Greg Hogan
Priority: Blocker


Following {{NullPointerException}} from pr1956 rebased to master.

After the first execution (the program calls {{DataSet.count()}}) the call to 
{{ExecutionConfig.serializeUserCode}} sets {{registeredKryoTypes}} and other 
fields to null. During the second execution (creating the actual result) access 
to this field throws a {{NullPointerException}}.

[~till.rohrmann] should {{serializeUserCode}} set the fields to a new 
{{LinkedHashSet}} and leave {{globalJobParameters}} unchanged?

{code}
org.apache.flink.client.program.ProgramInvocationException: The main method 
caused an error.
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:520)
at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:403)
at org.apache.flink.client.program.Client.runBlocking(Client.java:248)
at 
org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:860)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:327)
at 
org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1187)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1238)
Caused by: java.lang.NullPointerException
at 
org.apache.flink.api.common.ExecutionConfig.registerKryoType(ExecutionConfig.java:625)
at 
org.apache.flink.api.java.typeutils.runtime.kryo.Serializers.recursivelyRegisterType(Serializers.java:96)
at 
org.apache.flink.api.java.typeutils.runtime.kryo.Serializers.recursivelyRegisterType(Serializers.java:66)
at 
org.apache.flink.api.java.ExecutionEnvironment$1.preVisit(ExecutionEnvironment.java:1053)
at 
org.apache.flink.api.java.ExecutionEnvironment$1.preVisit(ExecutionEnvironment.java:1046)
at 
org.apache.flink.api.common.operators.SingleInputOperator.accept(SingleInputOperator.java:198)
at 
org.apache.flink.api.common.operators.SingleInputOperator.accept(SingleInputOperator.java:199)
at 
org.apache.flink.api.common.operators.SingleInputOperator.accept(SingleInputOperator.java:199)
at 
org.apache.flink.api.common.operators.SingleInputOperator.accept(SingleInputOperator.java:199)
at 
org.apache.flink.api.common.operators.SingleInputOperator.accept(SingleInputOperator.java:199)
at 
org.apache.flink.api.common.operators.SingleInputOperator.accept(SingleInputOperator.java:199)
at 
org.apache.flink.api.common.operators.DualInputOperator.accept(DualInputOperator.java:281)
at 
org.apache.flink.api.common.operators.GenericDataSinkBase.accept(GenericDataSinkBase.java:220)
at org.apache.flink.api.common.Plan.accept(Plan.java:333)
at 
org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:1046)
at 
org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:1004)
at 
org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:58)
at 
org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:898)
at 
org.apache.flink.api.java.utils.DataSetUtils.checksumHashCode(DataSetUtils.java:350)
at org.apache.flink.graph.examples.HITS.main(HITS.java:114)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:505)
... 6 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-3758) Add possibility to register accumulators in custom triggers

2016-05-03 Thread Konstantin Knauf (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Knauf reassigned FLINK-3758:
---

Assignee: Konstantin Knauf

> Add possibility to register accumulators in custom triggers
> ---
>
> Key: FLINK-3758
> URL: https://issues.apache.org/jira/browse/FLINK-3758
> Project: Flink
>  Issue Type: Improvement
>Reporter: Konstantin Knauf
>Assignee: Konstantin Knauf
>Priority: Minor
>
> For monitoring purposes it would be nice to be able to to use accumulators in 
> custom trigger functions. 
> Basically, the trigger context could just expose {{getAccumulator}} of 
> {{RuntimeContext}} or does this create problems I am not aware of?
> Adding accumulators in a trigger function is more difficult, I think, but 
> that's not really neccessary as the accummulator could just be added in some 
> other upstream operator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3857) Add reconnect attempt to Elasticsearch host

2016-05-03 Thread Subhobrata Dey (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269345#comment-15269345
 ] 

Subhobrata Dey commented on FLINK-3857:
---

Hello [~fhueske],

I'm interested in the task & assigning it to myself.

> Add reconnect attempt to Elasticsearch host
> ---
>
> Key: FLINK-3857
> URL: https://issues.apache.org/jira/browse/FLINK-3857
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Affects Versions: 1.1.0, 1.0.2
>Reporter: Fabian Hueske
>
> Currently, the connection to the Elasticsearch host is opened in 
> {{ElasticsearchSink.open()}}. In case the connection is lost (maybe due to a 
> changed DNS entry), the sink fails.
> I propose to catch the Exception for lost connections in the {{invoke()}} 
> method and try to re-open the connection for a configurable number of times 
> with a certain delay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-3857) Add reconnect attempt to Elasticsearch host

2016-05-03 Thread Subhobrata Dey (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Subhobrata Dey reassigned FLINK-3857:
-

Assignee: Subhobrata Dey

> Add reconnect attempt to Elasticsearch host
> ---
>
> Key: FLINK-3857
> URL: https://issues.apache.org/jira/browse/FLINK-3857
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Affects Versions: 1.1.0, 1.0.2
>Reporter: Fabian Hueske
>Assignee: Subhobrata Dey
>
> Currently, the connection to the Elasticsearch host is opened in 
> {{ElasticsearchSink.open()}}. In case the connection is lost (maybe due to a 
> changed DNS entry), the sink fails.
> I propose to catch the Exception for lost connections in the {{invoke()}} 
> method and try to re-open the connection for a configurable number of times 
> with a certain delay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3854) Support Avro key-value rolling sink writer

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269335#comment-15269335
 ] 

ASF GitHub Bot commented on FLINK-3854:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1953#issuecomment-216627907
  
That's a problem we currently have since the wikipedia IRC channel times 
out. Restarting wouldn't help, but in the future, if you want to restart you 
can push a new (possibly) empty commit.


> Support Avro key-value rolling sink writer
> --
>
> Key: FLINK-3854
> URL: https://issues.apache.org/jira/browse/FLINK-3854
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Affects Versions: 1.0.3
>Reporter: Igor Berman
>
> Support rolling sink writer in avro key value format.
> preferably without additional classpath dependencies
> preferable in same format as M/R jobs for backward compatibility



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3854] Support Avro key-value rolling si...

2016-05-03 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1953#issuecomment-216627907
  
That's a problem we currently have since the wikipedia IRC channel times 
out. Restarting wouldn't help, but in the future, if you want to restart you 
can push a new (possibly) empty commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3854] Support Avro key-value rolling si...

2016-05-03 Thread IgorBerman
Github user IgorBerman commented on the pull request:

https://github.com/apache/flink/pull/1953#issuecomment-216622861
  
@aljoscha can we rerun somehow build? I've checked it failed on  
flink-connector-wikiedits which isn't connected...unless I'm missing something

Running 
org.apache.flink.streaming.connectors.wikiedits.WikipediaEditsSourceTest
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 120.022 sec 
<<< FAILURE! - in 
org.apache.flink.streaming.connectors.wikiedits.WikipediaEditsSourceTest

testWikipediaEditsSource(org.apache.flink.streaming.connectors.wikiedits.WikipediaEditsSourceTest)
  Time elapsed: 120.01 sec  <<< ERROR!
java.lang.Exception: test timed out after 12 milliseconds
at sun.misc.Unsafe.park(Native Method)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
at 
scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3854) Support Avro key-value rolling sink writer

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269314#comment-15269314
 ] 

ASF GitHub Bot commented on FLINK-3854:
---

Github user IgorBerman commented on the pull request:

https://github.com/apache/flink/pull/1953#issuecomment-216622861
  
@aljoscha can we rerun somehow build? I've checked it failed on  
flink-connector-wikiedits which isn't connected...unless I'm missing something

Running 
org.apache.flink.streaming.connectors.wikiedits.WikipediaEditsSourceTest
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 120.022 sec 
<<< FAILURE! - in 
org.apache.flink.streaming.connectors.wikiedits.WikipediaEditsSourceTest

testWikipediaEditsSource(org.apache.flink.streaming.connectors.wikiedits.WikipediaEditsSourceTest)
  Time elapsed: 120.01 sec  <<< ERROR!
java.lang.Exception: test timed out after 12 milliseconds
at sun.misc.Unsafe.park(Native Method)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
at 
scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)


> Support Avro key-value rolling sink writer
> --
>
> Key: FLINK-3854
> URL: https://issues.apache.org/jira/browse/FLINK-3854
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming Connectors
>Affects Versions: 1.0.3
>Reporter: Igor Berman
>
> Support rolling sink writer in avro key value format.
> preferably without additional classpath dependencies
> preferable in same format as M/R jobs for backward compatibility



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269194#comment-15269194
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61925324
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+import org.apache.flink.util.Preconditions;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   Preconditions.checkArgument(maxIterations > 0, "The number of 
maximum iteration should be greater than 0.");
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   this(maxIterations, hitsParameter);
+   Preconditions.checkArgument(numberOfVertices > 0, "The number 
of vertices in graph should be greater than 0.");
+   this.numberOfVertices = numberOfVertices;
+   }
+
+   @Override
+   public DataSet> run(Graph netGraph) 
throws Exception {
+   if (this.numberOfVertices == 

[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-03 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61925324
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+import org.apache.flink.util.Preconditions;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
+
+   public static enum HITSParameter {
+   HUB,
+   AUTHORITY
+   }
+
+   private int maxIterations;
+   private long numberOfVertices;
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is known,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, long, 
HITSParameter)} constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, HITSParameter hitsParameter) {
+   Preconditions.checkArgument(maxIterations > 0, "The number of 
maximum iteration should be greater than 0.");
+   if (hitsParameter == HITSParameter.AUTHORITY) {
+   this.maxIterations = maxIterations * 2;
+   } else {
+   this.maxIterations = maxIterations * 2 + 1;
+   }
+   }
+
+   /**
+* Creates an instance of HITS algorithm.
+* If the number of vertices of the input graph is unknown,
+* use the {@link HITSAlgorithm#HITSAlgorithm(int, HITSParameter)} 
constructor instead.
+*
+* @param maxIterations the maximum number of iterations
+* @param hitsParameter the type of final web pages users want to get 
by this algorithm
+*/
+   public HITSAlgorithm(int maxIterations, long numberOfVertices, 
HITSParameter hitsParameter) {
+   this(maxIterations, hitsParameter);
+   Preconditions.checkArgument(numberOfVertices > 0, "The number 
of vertices in graph should be greater than 0.");
+   this.numberOfVertices = numberOfVertices;
+   }
+
+   @Override
+   public DataSet> run(Graph netGraph) 
throws Exception {
+   if (this.numberOfVertices == 0) {
+   this.numberOfVertices = netGraph.numberOfVertices();
--- End diff --

Where do we use the number of vertices?


---
If your project is set up for it, you can reply to this email and have your
reply 

[jira] [Commented] (FLINK-3650) Add maxBy/minBy to Scala DataSet API

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269116#comment-15269116
 ] 

ASF GitHub Bot commented on FLINK-3650:
---

Github user ramkrish86 commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r61918711
  
--- Diff: 
flink-scala/src/main/scala/org/apache/flink/api/scala/DataSet.scala ---
@@ -1599,7 +1601,77 @@ class DataSet[T: ClassTag](set: JavaDataSet[T]) {
   def output(outputFormat: OutputFormat[T]): DataSink[T] = {
 javaSet.output(outputFormat)
   }
-  
+
+  /**
+* Selects an element with minimum value.
+* 
+* The minimum is computed over the specified fields in lexicographical 
order.
+* 
+* Example 1: Given a data set with elements [0, 
1], [1, 0], the
+* results will be:
+* 
+* minBy(0): [0, 1]
+* minBy(1): [1, 0]
+* 
+* 
+* Example 2: Given a data set with elements [0, 
0], [0, 1], the
+* results will be:
+* 
+* minBy(0, 1): [0, 0]
+* 
+* 
+* If multiple values with minimum value at the specified fields exist, 
a random one will be
+* picked.
+* 
+* Internally, this operation is implemented as a {@link 
ReduceFunction}.
+*
+*/
+  def minBy(fields: Int*) : Unit  = {
--- End diff --

This should return the ReduceOperator. My bad. Not sure whether the 
existing test case really tests the entire functionality.


> Add maxBy/minBy to Scala DataSet API
> 
>
> Key: FLINK-3650
> URL: https://issues.apache.org/jira/browse/FLINK-3650
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API, Scala API
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: ramkrishna.s.vasudevan
>
> The stable Java DataSet API contains the API calls {{maxBy}} and {{minBy}}. 
> These methods are not supported by the Scala DataSet API. These methods 
> should be added in order to have a consistent API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3650 Add maxBy/minBy to Scala DataSet AP...

2016-05-03 Thread ramkrish86
Github user ramkrish86 commented on a diff in the pull request:

https://github.com/apache/flink/pull/1856#discussion_r61918711
  
--- Diff: 
flink-scala/src/main/scala/org/apache/flink/api/scala/DataSet.scala ---
@@ -1599,7 +1601,77 @@ class DataSet[T: ClassTag](set: JavaDataSet[T]) {
   def output(outputFormat: OutputFormat[T]): DataSink[T] = {
 javaSet.output(outputFormat)
   }
-  
+
+  /**
+* Selects an element with minimum value.
+* 
+* The minimum is computed over the specified fields in lexicographical 
order.
+* 
+* Example 1: Given a data set with elements [0, 
1], [1, 0], the
+* results will be:
+* 
+* minBy(0): [0, 1]
+* minBy(1): [1, 0]
+* 
+* 
+* Example 2: Given a data set with elements [0, 
0], [0, 1], the
+* results will be:
+* 
+* minBy(0, 1): [0, 0]
+* 
+* 
+* If multiple values with minimum value at the specified fields exist, 
a random one will be
+* picked.
+* 
+* Internally, this operation is implemented as a {@link 
ReduceFunction}.
+*
+*/
+  def minBy(fields: Int*) : Unit  = {
--- End diff --

This should return the ReduceOperator. My bad. Not sure whether the 
existing test case really tests the entire functionality.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3864) Yarn tests don't check for prohibited strings in log output

2016-05-03 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-3864:
-

 Summary: Yarn tests don't check for prohibited strings in log 
output
 Key: FLINK-3864
 URL: https://issues.apache.org/jira/browse/FLINK-3864
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.0.2, 1.1.0
Reporter: Maximilian Michels
Assignee: Maximilian Michels
 Fix For: 1.1.0


{{YarnTestBase.runWithArgs(...)}} provides a parameter for strings which must 
not appear in the log output. {{perJobYarnCluster}} and 
{{perJobYarnClusterWithParallelism}} have "System.out)" prepended to the 
prohibited strings; probably an artifact of an older test code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1996) Add output methods to Table API

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269019#comment-15269019
 ] 

ASF GitHub Bot commented on FLINK-1996:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1961#issuecomment-216588444
  
Thanks for the feedback @yjshen. 

The motivation of the `TableSink` interface is to support very different 
storage systems (JDBC, Cassandra, Kafka, HBase, ...) and formats (CSV, Parquet, 
Avro, etc.). The idea is to reuse existing OutputFormats (DataSet) and 
SinkFunctions (DataStream) as much as possible. The configuration of the 
`TableSink` with field names and types happens internally and is not 
user-facing. 

While the goal is to support many different systems, we do not want to blow 
up the the dependencies of the flink-table module. With the current design we 
can add TableSinks to the respective modules in `flink-batch-connectors` and 
`flink-streaming-connectors` and don't have to add all external dependencies to 
the Table API. Also we want to give users the option to define their own table 
sinks.

I am not sure about configuring the output type and parameters with untyped 
Strings. IMO, this makes it hard to identify and look up relevant parameters 
and options. 
But maybe we can add a registration of TableSinks to the TableEnvironment 
and do something like:

```
tEnv.registerSinkType("csv", classOf[CsvTableSink])

val t: Table = ...
t.toSink("csv").option("path", "/foo").option("fileDelim", "|")
```
We would need to find a way to pass the options to the TableSink 
constructor, maybe via reflection... 


> Add output methods to Table API
> ---
>
> Key: FLINK-1996
> URL: https://issues.apache.org/jira/browse/FLINK-1996
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> Tables need to be converted to DataSets (or DataStreams) to write them out. 
> It would be good to have a way to emit Table results directly for example to 
> print, CSV, JDBC, HBase, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1996] [tableApi] Add TableSink interfac...

2016-05-03 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1961#issuecomment-216588444
  
Thanks for the feedback @yjshen. 

The motivation of the `TableSink` interface is to support very different 
storage systems (JDBC, Cassandra, Kafka, HBase, ...) and formats (CSV, Parquet, 
Avro, etc.). The idea is to reuse existing OutputFormats (DataSet) and 
SinkFunctions (DataStream) as much as possible. The configuration of the 
`TableSink` with field names and types happens internally and is not 
user-facing. 

While the goal is to support many different systems, we do not want to blow 
up the the dependencies of the flink-table module. With the current design we 
can add TableSinks to the respective modules in `flink-batch-connectors` and 
`flink-streaming-connectors` and don't have to add all external dependencies to 
the Table API. Also we want to give users the option to define their own table 
sinks.

I am not sure about configuring the output type and parameters with untyped 
Strings. IMO, this makes it hard to identify and look up relevant parameters 
and options. 
But maybe we can add a registration of TableSinks to the TableEnvironment 
and do something like:

```
tEnv.registerSinkType("csv", classOf[CsvTableSink])

val t: Table = ...
t.toSink("csv").option("path", "/foo").option("fileDelim", "|")
```
We would need to find a way to pass the options to the TableSink 
constructor, maybe via reflection... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3863) Yarn Cluster shutdown may fail if leader changed recently

2016-05-03 Thread Maximilian Michels (JIRA)
Maximilian Michels created FLINK-3863:
-

 Summary: Yarn Cluster shutdown may fail if leader changed recently
 Key: FLINK-3863
 URL: https://issues.apache.org/jira/browse/FLINK-3863
 Project: Flink
  Issue Type: Bug
  Components: YARN Client
Affects Versions: 1.0.2, 1.1.0
Reporter: Maximilian Michels
Assignee: Maximilian Michels
Priority: Minor
 Fix For: 1.1.0


The {{ApplicationClient}} sets {{yarnJobManager}} to {{None}} until it has 
connected to a newly elected JobManager. A shutdown message to the application 
master is discarded while the ApplicationClient tries to reconnect. The 
ApplicationClient should retry to shutdown the cluster when it is connected to 
the new leader. It may also time out (which currently is always the case).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1996) Add output methods to Table API

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268940#comment-15268940
 ] 

ASF GitHub Bot commented on FLINK-1996:
---

Github user yjshen commented on the pull request:

https://github.com/apache/flink/pull/1961#issuecomment-216575391
  
Hi @fhueske , I've read through this PR and find a little wired of the 
current API design.

Please correct me if I take something wrong: Since we are output `Table`s, 
the schema is known at runtime, why should we first create a type agnostic 
`TableSink` and then configure it with specific name and types? What about
``` scala
val t: Table = ...
t.write().format("csv").option("delim", "|").option("path","/path/to/file")
env.execute()
```
and construct the `TableSink` when we are about to `execute()`? :)


> Add output methods to Table API
> ---
>
> Key: FLINK-1996
> URL: https://issues.apache.org/jira/browse/FLINK-1996
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> Tables need to be converted to DataSets (or DataStreams) to write them out. 
> It would be good to have a way to emit Table results directly for example to 
> print, CSV, JDBC, HBase, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1996] [tableApi] Add TableSink interfac...

2016-05-03 Thread yjshen
Github user yjshen commented on the pull request:

https://github.com/apache/flink/pull/1961#issuecomment-216575391
  
Hi @fhueske , I've read through this PR and find a little wired of the 
current API design.

Please correct me if I take something wrong: Since we are output `Table`s, 
the schema is known at runtime, why should we first create a type agnostic 
`TableSink` and then configure it with specific name and types? What about
``` scala
val t: Table = ...
t.write().format("csv").option("delim", "|").option("path","/path/to/file")
env.execute()
```
and construct the `TableSink` when we are about to `execute()`? :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3862) Restructure community website

2016-05-03 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268920#comment-15268920
 ] 

Fabian Hueske commented on FLINK-3862:
--

+1 for moving the third party packages and connectors to a different side. 
Should we move it to a wiki page or keep it on the project page? 

I agree, that the IRC channel is not well attended by Flink committers. It 
makes sense to move it down a bit.
Actually, I entered the channel today and found that about 15 persons are on 
the channel. So maybe it makes sense to occasionally spend some time in the 
channel.

> Restructure community website
> -
>
> Key: FLINK-3862
> URL: https://issues.apache.org/jira/browse/FLINK-3862
> Project: Flink
>  Issue Type: Task
>  Components: Project Website
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Priority: Minor
>
> The community website contains a large section of third party packages. It 
> might make sense to create a dedicated third party packages site to declutter 
> the community site. Furthermore, we should move the IRC communication channel 
> a bit further down in order to encourage people to rather use other 
> communication channels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3862) Restructure community website

2016-05-03 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-3862:
-
Summary: Restructure community website  (was: Restructure community slide)

> Restructure community website
> -
>
> Key: FLINK-3862
> URL: https://issues.apache.org/jira/browse/FLINK-3862
> Project: Flink
>  Issue Type: Task
>  Components: Project Website
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Priority: Minor
>
> The community website contains a large section of third party packages. It 
> might make sense to create a dedicated third party packages site to declutter 
> the community site. Furthermore, we should move the IRC communication channel 
> a bit further down in order to encourage people to rather use other 
> communication channels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3862) Restructure community slide

2016-05-03 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-3862:


 Summary: Restructure community slide
 Key: FLINK-3862
 URL: https://issues.apache.org/jira/browse/FLINK-3862
 Project: Flink
  Issue Type: Task
  Components: Project Website
Affects Versions: 1.1.0
Reporter: Till Rohrmann
Priority: Minor


The community website contains a large section of third party packages. It 
might make sense to create a dedicated third party packages site to declutter 
the community site. Furthermore, we should move the IRC communication channel a 
bit further down in order to encourage people to rather use other communication 
channels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3808: Refactor the whole file monitoring...

2016-05-03 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1929#issuecomment-216545948
  
Ah man, you're right, the test needs to be moved to the `flink-fs-tests` 
package, I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3808) Refactor the whole file monitoring source to take a fileInputFormat as an argument.

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268790#comment-15268790
 ] 

ASF GitHub Bot commented on FLINK-3808:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1929#issuecomment-216545948
  
Ah man, you're right, the test needs to be moved to the `flink-fs-tests` 
package, I think.


> Refactor the whole file monitoring source to take a fileInputFormat as an 
> argument.
> ---
>
> Key: FLINK-3808
> URL: https://issues.apache.org/jira/browse/FLINK-3808
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Kostas Kloudas
>Assignee: Kostas Kloudas
>
> This issue is just an intermediate step towards making the file source 
> fault-tolerant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-3669) WindowOperator registers a lot of timers at StreamTask

2016-05-03 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved FLINK-3669.
-
   Resolution: Fixed
Fix Version/s: 1.1.0

Fixed in 
https://github.com/apache/flink/commit/e7586c3b2d995be164100919d7c04db003a71a90

[~uce] should I also put this on the release branch?

> WindowOperator registers a lot of timers at StreamTask
> --
>
> Key: FLINK-3669
> URL: https://issues.apache.org/jira/browse/FLINK-3669
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.0.1
>Reporter: Aljoscha Krettek
>Assignee: Konstantin Knauf
>Priority: Blocker
> Fix For: 1.1.0
>
>
> Right now, the WindowOperator registers a timer at the StreamTask for every 
> processing-time timer that a Trigger registers. We should combine several 
> registered trigger timers to only register one low-level timer (timer 
> coalescing).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Timer coalescing across keys and cleanup of un...

2016-05-03 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1944#issuecomment-216535365
  
I merged it. Thanks a lot for your work! 😃 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2821) Change Akka configuration to allow accessing actors from different URLs

2016-05-03 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268703#comment-15268703
 ] 

Robert Metzger commented on FLINK-2821:
---

[~mxm], are you okay with me taking over these issues? my current task with 
kinesis is blocked on this ;)

> Change Akka configuration to allow accessing actors from different URLs
> ---
>
> Key: FLINK-2821
> URL: https://issues.apache.org/jira/browse/FLINK-2821
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Runtime
>Reporter: Robert Metzger
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> Akka expects the actor's URL to be exactly matching.
> As pointed out here, cases where users were complaining about this: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Error-trying-to-access-JM-through-proxy-td3018.html
>   - Proxy routing (as described here, send to the proxy URL, receiver 
> recognizes only original URL)
>   - Using hostname / IP interchangeably does not work (we solved this by 
> always putting IP addresses into URLs, never hostnames)
>   - Binding to multiple interfaces (any local 0.0.0.0) does not work. Still 
> no solution to that (but seems not too much of a restriction)
> I am aware that this is not possible due to Akka, so it is actually not a 
> Flink bug. But I think we should track the resolution of the issue here 
> anyways because its affecting our user's satisfaction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-3662) Bump Akka version to 2.4.x for Scala 2.11.x

2016-05-03 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-3662:
-

Assignee: Robert Metzger  (was: Maximilian Michels)

> Bump Akka version to 2.4.x for Scala 2.11.x
> ---
>
> Key: FLINK-3662
> URL: https://issues.apache.org/jira/browse/FLINK-3662
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.0.0
>Reporter: Maximilian Michels
>Assignee: Robert Metzger
> Fix For: 1.1.0
>
>
> In order to make use of newer Akka features (FLINK-2821), we need to update 
> Akka to version 2.4.x.
> To main backwards-compatibility, we have to adjust the 
> {{change_scala_version}} script to update the Akka version dependent on the 
> Scala version.
> Scala 2.10.x => Akka 2.3.x
> Scala 2.11.x => Akka 2.4.x



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2821) Change Akka configuration to allow accessing actors from different URLs

2016-05-03 Thread Maximilian Michels (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268708#comment-15268708
 ] 

Maximilian Michels commented on FLINK-2821:
---

Sure, go ahead!

> Change Akka configuration to allow accessing actors from different URLs
> ---
>
> Key: FLINK-2821
> URL: https://issues.apache.org/jira/browse/FLINK-2821
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Runtime
>Reporter: Robert Metzger
>Assignee: Maximilian Michels
> Fix For: 1.1.0
>
>
> Akka expects the actor's URL to be exactly matching.
> As pointed out here, cases where users were complaining about this: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Error-trying-to-access-JM-through-proxy-td3018.html
>   - Proxy routing (as described here, send to the proxy URL, receiver 
> recognizes only original URL)
>   - Using hostname / IP interchangeably does not work (we solved this by 
> always putting IP addresses into URLs, never hostnames)
>   - Binding to multiple interfaces (any local 0.0.0.0) does not work. Still 
> no solution to that (but seems not too much of a restriction)
> I am aware that this is not possible due to Akka, so it is actually not a 
> Flink bug. But I think we should track the resolution of the issue here 
> anyways because its affecting our user's satisfaction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-2821) Change Akka configuration to allow accessing actors from different URLs

2016-05-03 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-2821:
-

Assignee: Robert Metzger  (was: Maximilian Michels)

> Change Akka configuration to allow accessing actors from different URLs
> ---
>
> Key: FLINK-2821
> URL: https://issues.apache.org/jira/browse/FLINK-2821
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Runtime
>Reporter: Robert Metzger
>Assignee: Robert Metzger
> Fix For: 1.1.0
>
>
> Akka expects the actor's URL to be exactly matching.
> As pointed out here, cases where users were complaining about this: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Error-trying-to-access-JM-through-proxy-td3018.html
>   - Proxy routing (as described here, send to the proxy URL, receiver 
> recognizes only original URL)
>   - Using hostname / IP interchangeably does not work (we solved this by 
> always putting IP addresses into URLs, never hostnames)
>   - Binding to multiple interfaces (any local 0.0.0.0) does not work. Still 
> no solution to that (but seems not too much of a restriction)
> I am aware that this is not possible due to Akka, so it is actually not a 
> Flink bug. But I think we should track the resolution of the issue here 
> anyways because its affecting our user's satisfaction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-3661) Make Scala 2.11.x the default Scala version

2016-05-03 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-3661:
-

Assignee: Robert Metzger  (was: Maximilian Michels)

> Make Scala 2.11.x the default Scala version
> ---
>
> Key: FLINK-3661
> URL: https://issues.apache.org/jira/browse/FLINK-3661
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System, Distributed Runtime
>Affects Versions: 1.0.0
>Reporter: Maximilian Michels
>Assignee: Robert Metzger
> Fix For: 1.1.0
>
>
> Flink's default Scala version is 2.10.4. I'd propose to update it to Scala 
> 2.11.8 why still keeping the option to use Scala 2.10.x.
> By now, Scala 2.11 is already the preferred version many people use and Scala 
> 2.12 is around the corner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1827) Move test classes in test folders and fix scope of test dependencies

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268691#comment-15268691
 ] 

ASF GitHub Bot commented on FLINK-1827:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1915#issuecomment-216525073
  
@stefanobortoli yes.


> Move test classes in test folders and fix scope of test dependencies
> 
>
> Key: FLINK-1827
> URL: https://issues.apache.org/jira/browse/FLINK-1827
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 0.9
>Reporter: Flavio Pompermaier
>Priority: Minor
>  Labels: test-compile
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Right now it is not possible to avoid compilation of test classes 
> (-Dmaven.test.skip=true) because some project (e.g. flink-test-utils) 
> requires test classes in non-test sources (e.g. 
> scalatest_${scala.binary.version})
> Test classes should be moved to src/main/test (if Java) and src/test/scala 
> (if scala) and use scope=test for test dependencies



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1827] and small fixes in some tests

2016-05-03 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1915#issuecomment-216525073
  
@stefanobortoli yes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3772) Graph algorithms for vertex and edge degree

2016-05-03 Thread Greg Hogan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268685#comment-15268685
 ] 

Greg Hogan commented on FLINK-3772:
---

[~vkalavri], I implemented the degree functions from the {{Graph}} API using 
these algorithms. I did not change the return type.

> Graph algorithms for vertex and edge degree
> ---
>
> Key: FLINK-3772
> URL: https://issues.apache.org/jira/browse/FLINK-3772
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Affects Versions: 1.1.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.1.0
>
>
> Many graph algorithms require vertices or edges to be marked with the degree. 
> This ticket provides algorithms for annotating
> * vertex degree for undirected graphs
> * vertex out-, in-, and out- and in-degree for directed graphs
> * edge source, target, and source and target degree for undirected graphs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1827) Move test classes in test folders and fix scope of test dependencies

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268666#comment-15268666
 ] 

ASF GitHub Bot commented on FLINK-1827:
---

Github user stefanobortoli commented on the pull request:

https://github.com/apache/flink/pull/1915#issuecomment-216521677
  
 @tillrohrmann, I see the conflicts. How should I deal with this? rebase?


> Move test classes in test folders and fix scope of test dependencies
> 
>
> Key: FLINK-1827
> URL: https://issues.apache.org/jira/browse/FLINK-1827
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 0.9
>Reporter: Flavio Pompermaier
>Priority: Minor
>  Labels: test-compile
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Right now it is not possible to avoid compilation of test classes 
> (-Dmaven.test.skip=true) because some project (e.g. flink-test-utils) 
> requires test classes in non-test sources (e.g. 
> scalatest_${scala.binary.version})
> Test classes should be moved to src/main/test (if Java) and src/test/scala 
> (if scala) and use scope=test for test dependencies



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1827] and small fixes in some tests

2016-05-03 Thread stefanobortoli
Github user stefanobortoli commented on the pull request:

https://github.com/apache/flink/pull/1915#issuecomment-216521677
  
 @tillrohrmann, I see the conflicts. How should I deal with this? rebase?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3784) Unexpected results using collect() in RichMapPartitionFunction

2016-05-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/FLINK-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268656#comment-15268656
 ] 

Sergio Ramírez commented on FLINK-3784:
---

Yes, it is solved. Thanks for the support.

> Unexpected results using collect() in RichMapPartitionFunction
> --
>
> Key: FLINK-3784
> URL: https://issues.apache.org/jira/browse/FLINK-3784
> Project: Flink
>  Issue Type: Bug
>  Components: DataSet API, Machine Learning Library, Scala API
>Affects Versions: 1.0.0
> Environment: Debian 8.3
>Reporter: Sergio Ramírez
>
> The following code (in Scala) outputs unexpected registers when it tries to 
> transpose a simple matrix formed by LabeledVector. For each new key (feature, 
> partition), a different number of registers is presented despite all new 
> pairs should yield the same number of register as the data is dense (please, 
> take a look to the result with a sample dataset). 
> def mapPartition(it: java.lang.Iterable[LabeledVector], out: Collector[((Int, 
> Int), Int)]): Unit = {
>   val index = getRuntimeContext().getIndexOfThisSubtask() // 
> Partition index
>   var ninst = 0
>   for(reg <- it.asScala) {
> requireByteValues(reg.vector)
> ninst += 1
>   }
>   for(i <- 0 until nFeatures) out.collect((i, index) -> ninst)
> }
> Result: 
> Attribute 10, first seven partitions: 
> ((10,0),201),((10,1),200),((10,2),201),((10,3),200),((10,4),200),((10,5),201),((10,6),201),((10,7),201)
> Attribute 12, first seven partitions: 
> ((12,0),201),((12,1),201),((12,2),201),((12,3),200),((12,4),201),((12,5),200),((12,6),200),((12,7),201)
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3784) Unexpected results using collect() in RichMapPartitionFunction

2016-05-03 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/FLINK-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Ramírez closed FLINK-3784.
-
Resolution: Not A Problem

> Unexpected results using collect() in RichMapPartitionFunction
> --
>
> Key: FLINK-3784
> URL: https://issues.apache.org/jira/browse/FLINK-3784
> Project: Flink
>  Issue Type: Bug
>  Components: DataSet API, Machine Learning Library, Scala API
>Affects Versions: 1.0.0
> Environment: Debian 8.3
>Reporter: Sergio Ramírez
>
> The following code (in Scala) outputs unexpected registers when it tries to 
> transpose a simple matrix formed by LabeledVector. For each new key (feature, 
> partition), a different number of registers is presented despite all new 
> pairs should yield the same number of register as the data is dense (please, 
> take a look to the result with a sample dataset). 
> def mapPartition(it: java.lang.Iterable[LabeledVector], out: Collector[((Int, 
> Int), Int)]): Unit = {
>   val index = getRuntimeContext().getIndexOfThisSubtask() // 
> Partition index
>   var ninst = 0
>   for(reg <- it.asScala) {
> requireByteValues(reg.vector)
> ninst += 1
>   }
>   for(i <- 0 until nFeatures) out.collect((i, index) -> ninst)
> }
> Result: 
> Attribute 10, first seven partitions: 
> ((10,0),201),((10,1),200),((10,2),201),((10,3),200),((10,4),200),((10,5),201),((10,6),201),((10,7),201)
> Attribute 12, first seven partitions: 
> ((12,0),201),((12,1),201),((12,2),201),((12,3),200),((12,4),201),((12,5),200),((12,6),200),((12,7),201)
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3860) WikipediaEditsSourceTest.testWikipediaEditsSource times out

2016-05-03 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268637#comment-15268637
 ] 

Stephan Ewen commented on FLINK-3860:
-

How about not ignoring it (that effectively means removing the test), but 
adding a retry loop that first checks whether the connection is available 
(socket connect) and if not, skips the test?

That way, we would keep the coverage...

> WikipediaEditsSourceTest.testWikipediaEditsSource times out
> ---
>
> Key: FLINK-3860
> URL: https://issues.apache.org/jira/browse/FLINK-3860
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>  Labels: test-stability
>
> WikipediaEditsSourceTest.testWikipediaEditsSource consistently timed out on 
> my latest travis build.
> See logs [here| https://travis-ci.org/vasia/flink/builds/127446209].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3856) Create types for java.sql.Date/Time/Timestamp

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268625#comment-15268625
 ] 

ASF GitHub Bot commented on FLINK-3856:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1959#issuecomment-216513630
  
These types would also be useful for `flink-jdbc` and possibly other 
modules. We can move them to a dedicated `TimeTypeInfo` or `SqlTimeTypeInfo` 
class, but I think they should be part of `flink-core`.


> Create types for java.sql.Date/Time/Timestamp
> -
>
> Key: FLINK-3856
> URL: https://issues.apache.org/jira/browse/FLINK-3856
> Project: Flink
>  Issue Type: New Feature
>  Components: Core
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> At the moment there is only the {{Date}} type which is not sufficient for 
> most use cases about time.
> The Table API would also benefit from having different types as output result.
> I would propose to add the three {{java.sql.}} types either as {{BasicTypes}} 
> or in an additional class {{TimeTypes}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3856] [core] Create types for java.sql....

2016-05-03 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1959#issuecomment-216513630
  
These types would also be useful for `flink-jdbc` and possibly other 
modules. We can move them to a dedicated `TimeTypeInfo` or `SqlTimeTypeInfo` 
class, but I think they should be part of `flink-core`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3229) Kinesis streaming consumer with integration of Flink's checkpointing mechanics

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268620#comment-15268620
 ] 

ASF GitHub Bot commented on FLINK-3229:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1911#issuecomment-216512837
  
I'm currently working on a custom branch based on this pull request.
It seems that we are running into some dependency issues when using the 
kinesis-connector in AWS EMR.

It seems that there is a clash with the protobuf versions (kinesis needs 
2.6.x, but Flink has 2.5.0 in its classpath).

I keep you posted


> Kinesis streaming consumer with integration of Flink's checkpointing mechanics
> --
>
> Key: FLINK-3229
> URL: https://issues.apache.org/jira/browse/FLINK-3229
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming Connectors
>Affects Versions: 1.0.0
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Tzu-Li (Gordon) Tai
>
> Opening a sub-task to implement data source consumer for Kinesis streaming 
> connector (https://issues.apache.org/jira/browser/FLINK-3211).
> An example of the planned user API for Flink Kinesis Consumer:
> {code}
> Properties kinesisConfig = new Properties();
> config.put(KinesisConfigConstants.CONFIG_AWS_REGION, "us-east-1");
> config.put(KinesisConfigConstants.CONFIG_AWS_CREDENTIALS_PROVIDER_TYPE, 
> "BASIC");
> config.put(
> KinesisConfigConstants.CONFIG_AWS_CREDENTIALS_PROVIDER_BASIC_ACCESSKEYID, 
> "aws_access_key_id_here");
> config.put(
> KinesisConfigConstants.CONFIG_AWS_CREDENTIALS_PROVIDER_BASIC_SECRETKEY,
> "aws_secret_key_here");
> config.put(KinesisConfigConstants.CONFIG_STREAM_INIT_POSITION_TYPE, 
> "LATEST"); // or TRIM_HORIZON
> DataStream kinesisRecords = env.addSource(new FlinkKinesisConsumer<>(
> "kinesis_stream_name",
> new SimpleStringSchema(),
> kinesisConfig));
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3229] Flink streaming consumer for AWS ...

2016-05-03 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1911#issuecomment-216512837
  
I'm currently working on a custom branch based on this pull request.
It seems that we are running into some dependency issues when using the 
kinesis-connector in AWS EMR.

It seems that there is a clash with the protobuf versions (kinesis needs 
2.6.x, but Flink has 2.5.0 in its classpath).

I keep you posted


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1996) Add output methods to Table API

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268617#comment-15268617
 ] 

ASF GitHub Bot commented on FLINK-1996:
---

GitHub user fhueske opened a pull request:

https://github.com/apache/flink/pull/1961

[FLINK-1996] [tableApi] Add TableSink interface to emit tables to external 
storage.

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [X] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [X] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [X] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fhueske/flink tableSink

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1961.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1961


commit ffae8feddf67c3988a2422b227a1a22190b0e69e
Author: Fabian Hueske 
Date:   2016-04-30T19:11:40Z

[FLINK-1996] [tableApi] Add TableSink interface to emit tables to external 
storage.




> Add output methods to Table API
> ---
>
> Key: FLINK-1996
> URL: https://issues.apache.org/jira/browse/FLINK-1996
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: Fabian Hueske
>
> Tables need to be converted to DataSets (or DataStreams) to write them out. 
> It would be good to have a way to emit Table results directly for example to 
> print, CSV, JDBC, HBase, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1996] [tableApi] Add TableSink interfac...

2016-05-03 Thread fhueske
GitHub user fhueske opened a pull request:

https://github.com/apache/flink/pull/1961

[FLINK-1996] [tableApi] Add TableSink interface to emit tables to external 
storage.

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [X] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [X] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [X] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fhueske/flink tableSink

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1961.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1961


commit ffae8feddf67c3988a2422b227a1a22190b0e69e
Author: Fabian Hueske 
Date:   2016-04-30T19:11:40Z

[FLINK-1996] [tableApi] Add TableSink interface to emit tables to external 
storage.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3861) Add Scala's BigInteger and BigDecimal to Scala API

2016-05-03 Thread Timo Walther (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268613#comment-15268613
 ] 

Timo Walther commented on FLINK-3861:
-

[~aljoscha] you are a Scala expert. How would you implement this issue?
At the moment Scala's {{.fromElements(BigDecimal("42"))}} returns a GenericType 
while Java's {{.fromElements(new BigDecimal("42"))}} returns a BasicType. Do we 
need additional serializers for the Scala API?

> Add Scala's BigInteger and BigDecimal to Scala API
> --
>
> Key: FLINK-3861
> URL: https://issues.apache.org/jira/browse/FLINK-3861
> Project: Flink
>  Issue Type: New Feature
>  Components: Type Serialization System
>Reporter: Timo Walther
>
> In Java we now support {{java.math.BigDecimal/BigInteger}} as basic types. 
> However, Scala wraps these types into {{scala.math.BigDecimal/BigInteger}}. 
> These classes should also be supported to be in sync with the Java API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3856) Create types for java.sql.Date/Time/Timestamp

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268612#comment-15268612
 ] 

ASF GitHub Bot commented on FLINK-3856:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1959#issuecomment-216511830
  
Can these type infos exist independent of the BasicTypeInfo?
Either in some class like SQL type infos, or even only inside the Table API 
/ SQL project?


> Create types for java.sql.Date/Time/Timestamp
> -
>
> Key: FLINK-3856
> URL: https://issues.apache.org/jira/browse/FLINK-3856
> Project: Flink
>  Issue Type: New Feature
>  Components: Core
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> At the moment there is only the {{Date}} type which is not sufficient for 
> most use cases about time.
> The Table API would also benefit from having different types as output result.
> I would propose to add the three {{java.sql.}} types either as {{BasicTypes}} 
> or in an additional class {{TimeTypes}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3856] [core] Create types for java.sql....

2016-05-03 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1959#issuecomment-216511830
  
Can these type infos exist independent of the BasicTypeInfo?
Either in some class like SQL type infos, or even only inside the Table API 
/ SQL project?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3825) Update CEP documentation to include Scala API

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268610#comment-15268610
 ] 

ASF GitHub Bot commented on FLINK-3825:
---

GitHub user StefanRRichter opened a pull request:

https://github.com/apache/flink/pull/1960

[FLINK-3825] Documentation for CEP Scala API.

[FLINK-3825] Documentation for CEP Scala API.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StefanRRichter/flink dev-cep-scala-doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1960.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1960


commit 86f32febdfb4b1646e11926bc817b05174433bc2
Author: Stefan Richter 
Date:   2016-05-03T12:21:46Z

[FLINK-3825] Documentation for CEP Scala API.




> Update CEP documentation to include Scala API
> -
>
> Key: FLINK-3825
> URL: https://issues.apache.org/jira/browse/FLINK-3825
> Project: Flink
>  Issue Type: Improvement
>  Components: CEP, Documentation
>Affects Versions: 1.1.0
>Reporter: Till Rohrmann
>Assignee: Stefan Richter
>  Labels: documentation
>
> After adding the Scala CEP API FLINK-3708, we should update the online 
> documentation to also contain the Scala API. This can be done similarly to 
> the {{DataSet}} and {{DataStream}} API by providing Java and Scala code for 
> all examples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3825] Documentation for CEP Scala API.

2016-05-03 Thread StefanRRichter
GitHub user StefanRRichter opened a pull request:

https://github.com/apache/flink/pull/1960

[FLINK-3825] Documentation for CEP Scala API.

[FLINK-3825] Documentation for CEP Scala API.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StefanRRichter/flink dev-cep-scala-doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1960.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1960


commit 86f32febdfb4b1646e11926bc817b05174433bc2
Author: Stefan Richter 
Date:   2016-05-03T12:21:46Z

[FLINK-3825] Documentation for CEP Scala API.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3861) Add Scala's BigInteger and BigDecimal to Scala API

2016-05-03 Thread Timo Walther (JIRA)
Timo Walther created FLINK-3861:
---

 Summary: Add Scala's BigInteger and BigDecimal to Scala API
 Key: FLINK-3861
 URL: https://issues.apache.org/jira/browse/FLINK-3861
 Project: Flink
  Issue Type: New Feature
  Components: Type Serialization System
Reporter: Timo Walther


In Java we now support {{java.math.BigDecimal/BigInteger}} as basic types. 
However, Scala wraps these types into {{scala.math.BigDecimal/BigInteger}}. 
These classes should also be supported to be in sync with the Java API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3784) Unexpected results using collect() in RichMapPartitionFunction

2016-05-03 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268584#comment-15268584
 ] 

Chesnay Schepler commented on FLINK-3784:
-

I believe tThis Issue was resolved on the mailing-list, correct [~sramirez]? 

> Unexpected results using collect() in RichMapPartitionFunction
> --
>
> Key: FLINK-3784
> URL: https://issues.apache.org/jira/browse/FLINK-3784
> Project: Flink
>  Issue Type: Bug
>  Components: DataSet API, Machine Learning Library, Scala API
>Affects Versions: 1.0.0
> Environment: Debian 8.3
>Reporter: Sergio Ramírez
>
> The following code (in Scala) outputs unexpected registers when it tries to 
> transpose a simple matrix formed by LabeledVector. For each new key (feature, 
> partition), a different number of registers is presented despite all new 
> pairs should yield the same number of register as the data is dense (please, 
> take a look to the result with a sample dataset). 
> def mapPartition(it: java.lang.Iterable[LabeledVector], out: Collector[((Int, 
> Int), Int)]): Unit = {
>   val index = getRuntimeContext().getIndexOfThisSubtask() // 
> Partition index
>   var ninst = 0
>   for(reg <- it.asScala) {
> requireByteValues(reg.vector)
> ninst += 1
>   }
>   for(i <- 0 until nFeatures) out.collect((i, index) -> ninst)
> }
> Result: 
> Attribute 10, first seven partitions: 
> ((10,0),201),((10,1),200),((10,2),201),((10,3),200),((10,4),200),((10,5),201),((10,6),201),((10,7),201)
> Attribute 12, first seven partitions: 
> ((12,0),201),((12,1),201),((12,2),201),((12,3),200),((12,4),201),((12,5),200),((12,6),200),((12,7),201)
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3754][Table]Add a validation phase befo...

2016-05-03 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1958#issuecomment-216506763
  
That's a known issue, see FLINK-3860. No need to worry about this PR. 
I'll have a look soon, thanks for the update!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3754) Add a validation phase before construct RelNode using TableAPI

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268580#comment-15268580
 ] 

ASF GitHub Bot commented on FLINK-3754:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1958#issuecomment-216506763
  
That's a known issue, see FLINK-3860. No need to worry about this PR. 
I'll have a look soon, thanks for the update!


> Add a validation phase before construct RelNode using TableAPI
> --
>
> Key: FLINK-3754
> URL: https://issues.apache.org/jira/browse/FLINK-3754
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 1.0.0
>Reporter: Yijie Shen
>Assignee: Yijie Shen
>
> Unlike sql string's execution, which have a separate validation phase before 
> RelNode construction, Table API lacks the counterparts and the validation is 
> scattered in many places.
> I suggest to add a single validation phase and detect problems as early as 
> possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3754) Add a validation phase before construct RelNode using TableAPI

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268572#comment-15268572
 ] 

ASF GitHub Bot commented on FLINK-3754:
---

Github user yjshen commented on the pull request:

https://github.com/apache/flink/pull/1958#issuecomment-216505284
  
Test failure due to irrelevant test:
```
[INFO] flink-table  SUCCESS [08:03 
min]

[INFO] flink-connector-wikiedits .. FAILURE [02:02 
min]
```
```
Running 
org.apache.flink.streaming.connectors.wikiedits.WikipediaEditsSourceTest
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 120.051 sec 
<<< FAILURE! - in 
org.apache.flink.streaming.connectors.wikiedits.WikipediaEditsSourceTest

testWikipediaEditsSource(org.apache.flink.streaming.connectors.wikiedits.WikipediaEditsSourceTest)
  Time elapsed: 120.022 sec  <<< ERROR!
java.lang.Exception: test timed out after 12 milliseconds
```


> Add a validation phase before construct RelNode using TableAPI
> --
>
> Key: FLINK-3754
> URL: https://issues.apache.org/jira/browse/FLINK-3754
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 1.0.0
>Reporter: Yijie Shen
>Assignee: Yijie Shen
>
> Unlike sql string's execution, which have a separate validation phase before 
> RelNode construction, Table API lacks the counterparts and the validation is 
> scattered in many places.
> I suggest to add a single validation phase and detect problems as early as 
> possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3754][Table]Add a validation phase befo...

2016-05-03 Thread yjshen
Github user yjshen commented on the pull request:

https://github.com/apache/flink/pull/1958#issuecomment-216505284
  
Test failure due to irrelevant test:
```
[INFO] flink-table  SUCCESS [08:03 
min]

[INFO] flink-connector-wikiedits .. FAILURE [02:02 
min]
```
```
Running 
org.apache.flink.streaming.connectors.wikiedits.WikipediaEditsSourceTest
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 120.051 sec 
<<< FAILURE! - in 
org.apache.flink.streaming.connectors.wikiedits.WikipediaEditsSourceTest

testWikipediaEditsSource(org.apache.flink.streaming.connectors.wikiedits.WikipediaEditsSourceTest)
  Time elapsed: 120.022 sec  <<< ERROR!
java.lang.Exception: test timed out after 12 milliseconds
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3808) Refactor the whole file monitoring source to take a fileInputFormat as an argument.

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268567#comment-15268567
 ] 

ASF GitHub Bot commented on FLINK-3808:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1929#issuecomment-216503327
  
One related test failure; the hadoop dependency could not be found for 
PROFILE="-Dhadoop.profile=1"


> Refactor the whole file monitoring source to take a fileInputFormat as an 
> argument.
> ---
>
> Key: FLINK-3808
> URL: https://issues.apache.org/jira/browse/FLINK-3808
> Project: Flink
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: Kostas Kloudas
>Assignee: Kostas Kloudas
>
> This issue is just an intermediate step towards making the file source 
> fault-tolerant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3808: Refactor the whole file monitoring...

2016-05-03 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1929#issuecomment-216503327
  
One related test failure; the hadoop dependency could not be found for 
PROFILE="-Dhadoop.profile=1"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3754) Add a validation phase before construct RelNode using TableAPI

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268560#comment-15268560
 ] 

ASF GitHub Bot commented on FLINK-3754:
---

Github user yjshen commented on the pull request:

https://github.com/apache/flink/pull/1958#issuecomment-216502690
  
@zentol OK, I've closed it.


> Add a validation phase before construct RelNode using TableAPI
> --
>
> Key: FLINK-3754
> URL: https://issues.apache.org/jira/browse/FLINK-3754
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 1.0.0
>Reporter: Yijie Shen
>Assignee: Yijie Shen
>
> Unlike sql string's execution, which have a separate validation phase before 
> RelNode construction, Table API lacks the counterparts and the validation is 
> scattered in many places.
> I suggest to add a single validation phase and detect problems as early as 
> possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268553#comment-15268553
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61868075
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
--- End diff --

Done!


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Assignee: GaoLun
>Priority: Minor
>
> Implementation of Hits Algorithm in Gelly API using Java. the feature branch 
> can be found here: (https://github.com/JavidMayar/flink/commits/HITS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3754][Table]Add a validation phase befo...

2016-05-03 Thread yjshen
Github user yjshen commented on the pull request:

https://github.com/apache/flink/pull/1958#issuecomment-216502690
  
@zentol OK, I've closed it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3404) Extend Kafka consumers with interface StoppableFunction

2016-05-03 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268558#comment-15268558
 ] 

Aljoscha Krettek commented on FLINK-3404:
-

In some way, yes.

> Extend Kafka consumers with interface StoppableFunction
> ---
>
> Key: FLINK-3404
> URL: https://issues.apache.org/jira/browse/FLINK-3404
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Reporter: Matthias J. Sax
>
> Kafka consumers are not stoppable right now. To make them stoppable, they 
> must implement {{StoppableFunction}}. Implementing method {{stop()}} must 
> ensure, that the consumer stops pulling new messages from Kafka and issues a 
> final checkpoint with the last offset. Afterwards, {{run()}} must return.
> When implementing this, keep in mind, that the gathered checkpoint might 
> later be used as a savepoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3860) WikipediaEditsSourceTest.testWikipediaEditsSource times out

2016-05-03 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268555#comment-15268555
 ] 

Aljoscha Krettek commented on FLINK-3860:
-

+1

> WikipediaEditsSourceTest.testWikipediaEditsSource times out
> ---
>
> Key: FLINK-3860
> URL: https://issues.apache.org/jira/browse/FLINK-3860
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>  Labels: test-stability
>
> WikipediaEditsSourceTest.testWikipediaEditsSource consistently timed out on 
> my latest travis build.
> See logs [here| https://travis-ci.org/vasia/flink/builds/127446209].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-03 Thread gallenvara
Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61868075
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
--- End diff --

Done!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3754) Add a validation phase before construct RelNode using TableAPI

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268552#comment-15268552
 ] 

ASF GitHub Bot commented on FLINK-3754:
---

Github user yjshen closed the pull request at:

https://github.com/apache/flink/pull/1916


> Add a validation phase before construct RelNode using TableAPI
> --
>
> Key: FLINK-3754
> URL: https://issues.apache.org/jira/browse/FLINK-3754
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 1.0.0
>Reporter: Yijie Shen
>Assignee: Yijie Shen
>
> Unlike sql string's execution, which have a separate validation phase before 
> RelNode construction, Table API lacks the counterparts and the validation is 
> scattered in many places.
> I suggest to add a single validation phase and detect problems as early as 
> possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3754][Table]Add a validation phase befo...

2016-05-03 Thread yjshen
Github user yjshen closed the pull request at:

https://github.com/apache/flink/pull/1916


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3754) Add a validation phase before construct RelNode using TableAPI

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268551#comment-15268551
 ] 

ASF GitHub Bot commented on FLINK-3754:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1958#issuecomment-216501884
  
Since this PR is a substitute, could you close the old one? Thanks.


> Add a validation phase before construct RelNode using TableAPI
> --
>
> Key: FLINK-3754
> URL: https://issues.apache.org/jira/browse/FLINK-3754
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 1.0.0
>Reporter: Yijie Shen
>Assignee: Yijie Shen
>
> Unlike sql string's execution, which have a separate validation phase before 
> RelNode construction, Table API lacks the counterparts and the validation is 
> scattered in many places.
> I suggest to add a single validation phase and detect problems as early as 
> possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3754][Table]Add a validation phase befo...

2016-05-03 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1958#issuecomment-216501884
  
Since this PR is a substitute, could you close the old one? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268540#comment-15268540
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61866752
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
--- End diff --

yes, you are right.


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Assignee: GaoLun
>Priority: Minor
>
> Implementation of Hits Algorithm in Gelly API using Java. the feature branch 
> can be found here: (https://github.com/JavidMayar/flink/commits/HITS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-03 Thread gallenvara
Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61866752
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
--- End diff --

yes, you are right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268535#comment-15268535
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61866283
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
--- End diff --

It can be parametrized with "EV" and the algorithm can set it to 
`NullValue` internally. This way, users won't have to first map their input 
graphs to `NullValue` edge value types.


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Assignee: GaoLun
>Priority: Minor
>
> Implementation of Hits Algorithm in Gelly API using Java. the feature branch 
> can be found here: (https://github.com/JavidMayar/flink/commits/HITS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-03 Thread vasia
Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61866283
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
--- End diff --

It can be parametrized with "EV" and the algorithm can set it to 
`NullValue` internally. This way, users won't have to first map their input 
graphs to `NullValue` edge value types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268524#comment-15268524
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61865225
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
--- End diff --

The edge value is not used throughout the process. It would be better to 
set to `NullValue` as hard code, IMO.


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Assignee: GaoLun
>Priority: Minor
>
> Implementation of Hits Algorithm in Gelly API using Java. the feature branch 
> can be found here: (https://github.com/JavidMayar/flink/commits/HITS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-03 Thread gallenvara
Github user gallenvara commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61865225
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
--- End diff --

The edge value is not used throughout the process. It would be better to 
set to `NullValue` as hard code, IMO.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3856) Create types for java.sql.Date/Time/Timestamp

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268523#comment-15268523
 ] 

ASF GitHub Bot commented on FLINK-3856:
---

GitHub user twalthr opened a pull request:

https://github.com/apache/flink/pull/1959

[FLINK-3856] [core] Create types for java.sql.Date/Time/Timestamp

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [x] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [x] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [x] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed

This PR adds java.sql.Date/Time/Timestamp as basic types. I declared them 
PublicEvolving, therefore I didn't add the types to the documentation. I 
improved the Date serialization to use Long.MIN_VALUE instead of -1. But it 
still does not solve FLINK-3858 completely.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/twalthr/flink DateTimeTimestamp

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1959.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1959


commit b294fe3f1988582baf4b1d948d95e1efd5293d80
Author: twalthr 
Date:   2016-05-02T14:31:45Z

[FLINK-3856] [core] Create types for java.sql.Date/Time/Timestamp




> Create types for java.sql.Date/Time/Timestamp
> -
>
> Key: FLINK-3856
> URL: https://issues.apache.org/jira/browse/FLINK-3856
> Project: Flink
>  Issue Type: New Feature
>  Components: Core
>Reporter: Timo Walther
>Assignee: Timo Walther
>
> At the moment there is only the {{Date}} type which is not sufficient for 
> most use cases about time.
> The Table API would also benefit from having different types as output result.
> I would propose to add the three {{java.sql.}} types either as {{BasicTypes}} 
> or in an additional class {{TimeTypes}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3856] [core] Create types for java.sql....

2016-05-03 Thread twalthr
GitHub user twalthr opened a pull request:

https://github.com/apache/flink/pull/1959

[FLINK-3856] [core] Create types for java.sql.Date/Time/Timestamp

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [x] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [x] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [x] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed

This PR adds java.sql.Date/Time/Timestamp as basic types. I declared them 
PublicEvolving, therefore I didn't add the types to the documentation. I 
improved the Date serialization to use Long.MIN_VALUE instead of -1. But it 
still does not solve FLINK-3858 completely.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/twalthr/flink DateTimeTimestamp

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1959.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1959


commit b294fe3f1988582baf4b1d948d95e1efd5293d80
Author: twalthr 
Date:   2016-05-02T14:31:45Z

[FLINK-3856] [core] Create types for java.sql.Date/Time/Timestamp




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3404) Extend Kafka consumers with interface StoppableFunction

2016-05-03 Thread Martin Liesenberg (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268522#comment-15268522
 ] 

Martin Liesenberg commented on FLINK-3404:
--

ok, I didn't realize that needed to be done first. 

and just out of curiosity, what it boils down to, is that some version of 
distributed consensus needs to be implemented here first?

> Extend Kafka consumers with interface StoppableFunction
> ---
>
> Key: FLINK-3404
> URL: https://issues.apache.org/jira/browse/FLINK-3404
> Project: Flink
>  Issue Type: Improvement
>  Components: Kafka Connector
>Reporter: Matthias J. Sax
>
> Kafka consumers are not stoppable right now. To make them stoppable, they 
> must implement {{StoppableFunction}}. Implementing method {{stop()}} must 
> ensure, that the consumer stops pulling new messages from Kafka and issues a 
> final checkpoint with the last offset. Afterwards, {{run()}} must return.
> When implementing this, keep in mind, that the gathered checkpoint might 
> later be used as a savepoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-2044) Implementation of Gelly HITS Algorithm

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268518#comment-15268518
 ] 

ASF GitHub Bot commented on FLINK-2044:
---

Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61864334
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
--- End diff --

Haven't looked at your latest commit, but you can parameterize with "EV" as 
you have with "K".


> Implementation of Gelly HITS Algorithm
> --
>
> Key: FLINK-2044
> URL: https://issues.apache.org/jira/browse/FLINK-2044
> Project: Flink
>  Issue Type: New Feature
>  Components: Gelly
>Reporter: Ahamd Javid
>Assignee: GaoLun
>Priority: Minor
>
> Implementation of Hits Algorithm in Gelly API using Java. the feature branch 
> can be found here: (https://github.com/JavidMayar/flink/commits/HITS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-2044] [gelly] Implementation of Gelly H...

2016-05-03 Thread greghogan
Github user greghogan commented on a diff in the pull request:

https://github.com/apache/flink/pull/1956#discussion_r61864334
  
--- Diff: 
flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/HITSAlgorithm.java
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+import org.apache.flink.api.common.aggregators.DoubleSumAggregator;
+import org.apache.flink.api.java.DataSet;
+
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.spargel.MessageIterator;
+import org.apache.flink.graph.spargel.MessagingFunction;
+import org.apache.flink.graph.spargel.ScatterGatherConfiguration;
+import org.apache.flink.graph.spargel.VertexUpdateFunction;
+import org.apache.flink.types.DoubleValue;
+
+/**
+ * This is an implementation of HITS algorithm, using a scatter-gather 
iteration.
+ * The user can define the maximum number of iterations. HITS algorithm is 
determined by two parameters,
+ * hubs and authorities. A good hub represented a page that pointed to 
many other pages, and a good authority
+ * represented a page that was linked by many different hubs. The 
implementation assumes that the two value on
+ * every vertex are the same at the beginning.
+ * 
+ * If the number of vertices of the input graph is known, it should be 
provided as a parameter
+ * to speed up computation. Otherwise, the algorithm will first execute a 
job to count the vertices.
+ */
+public class HITSAlgorithm implements GraphAlgorithm>> {
--- End diff --

Haven't looked at your latest commit, but you can parameterize with "EV" as 
you have with "K".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3860) WikipediaEditsSourceTest.testWikipediaEditsSource times out

2016-05-03 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268494#comment-15268494
 ] 

Chesnay Schepler commented on FLINK-3860:
-

i agree that we should @Ignore it.

> WikipediaEditsSourceTest.testWikipediaEditsSource times out
> ---
>
> Key: FLINK-3860
> URL: https://issues.apache.org/jira/browse/FLINK-3860
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>  Labels: test-stability
>
> WikipediaEditsSourceTest.testWikipediaEditsSource consistently timed out on 
> my latest travis build.
> See logs [here| https://travis-ci.org/vasia/flink/builds/127446209].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3860) WikipediaEditsSourceTest.testWikipediaEditsSource times out

2016-05-03 Thread Ufuk Celebi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268489#comment-15268489
 ] 

Ufuk Celebi commented on FLINK-3860:


Yes, that's problematic for other reasons as well. I would like to add an 
@Ignore to this test. It's not a core Flink feature and we won't get it 
reliable as long as it relies on an external service. We could write a mock IRC 
server, but since it's not a core Flink thing, I don't think that anyone will 
invest time.

> WikipediaEditsSourceTest.testWikipediaEditsSource times out
> ---
>
> Key: FLINK-3860
> URL: https://issues.apache.org/jira/browse/FLINK-3860
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>  Labels: test-stability
>
> WikipediaEditsSourceTest.testWikipediaEditsSource consistently timed out on 
> my latest travis build.
> See logs [here| https://travis-ci.org/vasia/flink/builds/127446209].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3860) WikipediaEditsSourceTest.testWikipediaEditsSource times out

2016-05-03 Thread Chesnay Schepler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268472#comment-15268472
 ] 

Chesnay Schepler commented on FLINK-3860:
-

this may be simply an availability issue, as the test relies on external data 
afaik.

> WikipediaEditsSourceTest.testWikipediaEditsSource times out
> ---
>
> Key: FLINK-3860
> URL: https://issues.apache.org/jira/browse/FLINK-3860
> Project: Flink
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 1.1.0
>Reporter: Vasia Kalavri
>  Labels: test-stability
>
> WikipediaEditsSourceTest.testWikipediaEditsSource consistently timed out on 
> my latest travis build.
> See logs [here| https://travis-ci.org/vasia/flink/builds/127446209].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Jackson version upgrade: default from 2.4.2 to...

2016-05-03 Thread smarthi
Github user smarthi commented on the pull request:

https://github.com/apache/flink/pull/1952#issuecomment-216487812
  
There IRC Wikimedia channel is timing out and hence the reason for Wiki 
test failures, nothing to do with this PR.

LGTM IMO


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3754) Add a validation phase before construct RelNode using TableAPI

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268465#comment-15268465
 ] 

ASF GitHub Bot commented on FLINK-3754:
---

Github user yjshen commented on the pull request:

https://github.com/apache/flink/pull/1958#issuecomment-216487312
  
This PR substitute #1916 by squashing several previous commits into single 
one for easier rebase.

@fhueske I've implemented eager validation here, can you take a look at 
this one?


> Add a validation phase before construct RelNode using TableAPI
> --
>
> Key: FLINK-3754
> URL: https://issues.apache.org/jira/browse/FLINK-3754
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 1.0.0
>Reporter: Yijie Shen
>Assignee: Yijie Shen
>
> Unlike sql string's execution, which have a separate validation phase before 
> RelNode construction, Table API lacks the counterparts and the validation is 
> scattered in many places.
> I suggest to add a single validation phase and detect problems as early as 
> possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3754][Table]Add a validation phase befo...

2016-05-03 Thread yjshen
Github user yjshen commented on the pull request:

https://github.com/apache/flink/pull/1958#issuecomment-216487312
  
This PR substitute #1916 by squashing several previous commits into single 
one for easier rebase.

@fhueske I've implemented eager validation here, can you take a look at 
this one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3754) Add a validation phase before construct RelNode using TableAPI

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268456#comment-15268456
 ] 

ASF GitHub Bot commented on FLINK-3754:
---

GitHub user yjshen opened a pull request:

https://github.com/apache/flink/pull/1958

[FLINK-3754][Table]Add a validation phase before construct RelNode using 
TableAPI

This PR aims at adding an extra phase of **validation** for plans generated 
from Table API, matches the functionality of Calcite's Validator that are 
called during we execute an query expressed in SQL String.

In order to do this, I inserted a new layer between TableAPI and `RelNode` 
construction: The `Logical Plan`.

And the main procedure of validation work as follows:

1. Constructing a logical plan node
2. Do resolution using schema and `FunctionCatalog`
3. Do validation on the type annotated logical plan node

After we finish the validation successfully, it's safe to construct 
`RelNode`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yjshen/flink eager_validation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1958.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1958


commit ec6bf418e065d836f5399275d4ab24b9c29ab0fe
Author: Yijie Shen 
Date:   2016-04-13T08:46:58Z

Add an extra validation phase before construct RelNode.

Squash previous commits into single one for easier rebase.
The eight previous commits are:

  make TreeNode extends Product
  wip expressions validation, should create expressions for functions next
  add functions for math and string
  wip move table api on logicalNode
  resolve and validate next
  wip
  fix bug in validator, merge eval, add doc
  resolve comments

commit d31a782475903b16f57fee819fc1ddb830aaa597
Author: Yijie Shen 
Date:   2016-05-03T08:57:02Z

do eager validation




> Add a validation phase before construct RelNode using TableAPI
> --
>
> Key: FLINK-3754
> URL: https://issues.apache.org/jira/browse/FLINK-3754
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 1.0.0
>Reporter: Yijie Shen
>Assignee: Yijie Shen
>
> Unlike sql string's execution, which have a separate validation phase before 
> RelNode construction, Table API lacks the counterparts and the validation is 
> scattered in many places.
> I suggest to add a single validation phase and detect problems as early as 
> possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3754][Table]Add a validation phase befo...

2016-05-03 Thread yjshen
GitHub user yjshen opened a pull request:

https://github.com/apache/flink/pull/1958

[FLINK-3754][Table]Add a validation phase before construct RelNode using 
TableAPI

This PR aims at adding an extra phase of **validation** for plans generated 
from Table API, matches the functionality of Calcite's Validator that are 
called during we execute an query expressed in SQL String.

In order to do this, I inserted a new layer between TableAPI and `RelNode` 
construction: The `Logical Plan`.

And the main procedure of validation work as follows:

1. Constructing a logical plan node
2. Do resolution using schema and `FunctionCatalog`
3. Do validation on the type annotated logical plan node

After we finish the validation successfully, it's safe to construct 
`RelNode`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yjshen/flink eager_validation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1958.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1958


commit ec6bf418e065d836f5399275d4ab24b9c29ab0fe
Author: Yijie Shen 
Date:   2016-04-13T08:46:58Z

Add an extra validation phase before construct RelNode.

Squash previous commits into single one for easier rebase.
The eight previous commits are:

  make TreeNode extends Product
  wip expressions validation, should create expressions for functions next
  add functions for math and string
  wip move table api on logicalNode
  resolve and validate next
  wip
  fix bug in validator, merge eval, add doc
  resolve comments

commit d31a782475903b16f57fee819fc1ddb830aaa597
Author: Yijie Shen 
Date:   2016-05-03T08:57:02Z

do eager validation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-3860) WikipediaEditsSourceTest.testWikipediaEditsSource times out

2016-05-03 Thread Vasia Kalavri (JIRA)
Vasia Kalavri created FLINK-3860:


 Summary: WikipediaEditsSourceTest.testWikipediaEditsSource times 
out
 Key: FLINK-3860
 URL: https://issues.apache.org/jira/browse/FLINK-3860
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.1.0
Reporter: Vasia Kalavri


WikipediaEditsSourceTest.testWikipediaEditsSource consistently timed out on my 
latest travis build.
See logs [here| https://travis-ci.org/vasia/flink/builds/127446209].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3669) WindowOperator registers a lot of timers at StreamTask

2016-05-03 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268419#comment-15268419
 ] 

Aljoscha Krettek commented on FLINK-3669:
-

Ok, I think it's good to merge then. :-)

> WindowOperator registers a lot of timers at StreamTask
> --
>
> Key: FLINK-3669
> URL: https://issues.apache.org/jira/browse/FLINK-3669
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.0.1
>Reporter: Aljoscha Krettek
>Assignee: Konstantin Knauf
>Priority: Blocker
>
> Right now, the WindowOperator registers a timer at the StreamTask for every 
> processing-time timer that a Trigger registers. We should combine several 
> registered trigger timers to only register one low-level timer (timer 
> coalescing).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1827) Move test classes in test folders and fix scope of test dependencies

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268410#comment-15268410
 ] 

ASF GitHub Bot commented on FLINK-1827:
---

Github user stefanobortoli commented on a diff in the pull request:

https://github.com/apache/flink/pull/1915#discussion_r61855824
  
--- Diff: 
flink-runtime/src/test/java/org/apache/flink/runtime/util/StartupUtils.java ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.util;
+
+import java.util.List;
+
+public class StartupUtils {
+   
+   /**
+* A utility method to analyze the exceptions and collect the clauses
+* 
+* @param e  the root exception (Throwable) object
+* @param causes  the list of exceptions that caused the root exceptions
+* @return  a list of Throwable
+*/
+   public List getExceptionCauses(Throwable e, List 
causes) {
--- End diff --

on it


> Move test classes in test folders and fix scope of test dependencies
> 
>
> Key: FLINK-1827
> URL: https://issues.apache.org/jira/browse/FLINK-1827
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 0.9
>Reporter: Flavio Pompermaier
>Priority: Minor
>  Labels: test-compile
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Right now it is not possible to avoid compilation of test classes 
> (-Dmaven.test.skip=true) because some project (e.g. flink-test-utils) 
> requires test classes in non-test sources (e.g. 
> scalatest_${scala.binary.version})
> Test classes should be moved to src/main/test (if Java) and src/test/scala 
> (if scala) and use scope=test for test dependencies



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1827] and small fixes in some tests

2016-05-03 Thread stefanobortoli
Github user stefanobortoli commented on a diff in the pull request:

https://github.com/apache/flink/pull/1915#discussion_r61855824
  
--- Diff: 
flink-runtime/src/test/java/org/apache/flink/runtime/util/StartupUtils.java ---
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.runtime.util;
+
+import java.util.List;
+
+public class StartupUtils {
+   
+   /**
+* A utility method to analyze the exceptions and collect the clauses
+* 
+* @param e  the root exception (Throwable) object
+* @param causes  the list of exceptions that caused the root exceptions
+* @return  a list of Throwable
+*/
+   public List getExceptionCauses(Throwable e, List 
causes) {
--- End diff --

on it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1827) Move test classes in test folders and fix scope of test dependencies

2016-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15268408#comment-15268408
 ] 

ASF GitHub Bot commented on FLINK-1827:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1915#issuecomment-216475737
  
Modulo one more inline comment, I think it looks good.


> Move test classes in test folders and fix scope of test dependencies
> 
>
> Key: FLINK-1827
> URL: https://issues.apache.org/jira/browse/FLINK-1827
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 0.9
>Reporter: Flavio Pompermaier
>Priority: Minor
>  Labels: test-compile
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Right now it is not possible to avoid compilation of test classes 
> (-Dmaven.test.skip=true) because some project (e.g. flink-test-utils) 
> requires test classes in non-test sources (e.g. 
> scalatest_${scala.binary.version})
> Test classes should be moved to src/main/test (if Java) and src/test/scala 
> (if scala) and use scope=test for test dependencies



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >