Re: Limitations on grouped ReduceFunction

2016-02-03 Thread Stephan Ewen
For now, I think it is worth documenting and leaving it as it is.

A while back we thought about adding a Static Code Analysis rule to find
such cases and create a warning. For Reduce, that is quite straightforward,
for GrouReduce quite tricky...
Am 02.02.2016 21:55 schrieb "Greg Hogan" :

> If a user modifies keyed fields of a grouped reduce during a combine then
> the reduce will receive incorrect groupings. For example, a useless
> modification to word count:
>
>   public WC reduce(WC in1, WC in2) {
> return new WC(in1.word + " " + in2.word, in1.count + in2.count);
>   }
>
> I don't see an efficient means to prevent this. Is this limitation worth
> documenting, or can we safely assume that no one will ever attempt this?
> MapReduce also has this limitation, and Spark gets around this by
> separating keys and values and only presenting values to reduce.
>
> "Reduce on Grouped DataSet: A Reduce transformation that is applied on a
> grouped DataSet reduces each group to a single element using a user-defined
> reduce function. For each group of input elements, a reduce function
> successively combines pairs of elements into one element until only a
> single element for each group remains."
>
> Greg
>


Re: Connector Documentation missing

2016-02-03 Thread Suneel Marthi
Most of the flume code is commented out IIRC?

Sent from my iPhone

> On Feb 3, 2016, at 8:24 AM, Matthias J. Sax  wrote:
> 
> Hi,
> 
> I just observed that there are 7 flink-streaming-connectors available
> but only 5 are documented on the web page.
> 
> Flume and Nifi are not documented. Did we miss to extend the
> documentation for both (which should have been part of the commit of the
> code) or was this left out on purpose?
> 
> -Matthias
> 
> 


Re: Connector Documentation missing

2016-02-03 Thread Maximilian Michels
There is currently only a FlumeSink. The FlumeSource is a dummy file
(copied from the AvroSource) and needs to be removed.

On Wed, Feb 3, 2016 at 3:11 PM, Matthias J. Sax  wrote:

> FlumeSink is there. FlumeSource and FlumeTopology is all put in
> comments... Not sure about it.
>
> There is no JIRA for a Flume connector... Does anyone know the status of
> Flume connector?
>
>
> On 02/03/2016 02:45 PM, Suneel Marthi wrote:
> > Most of the flume code is commented out IIRC?
> >
> > Sent from my iPhone
> >
> >> On Feb 3, 2016, at 8:24 AM, Matthias J. Sax  wrote:
> >>
> >> Hi,
> >>
> >> I just observed that there are 7 flink-streaming-connectors available
> >> but only 5 are documented on the web page.
> >>
> >> Flume and Nifi are not documented. Did we miss to extend the
> >> documentation for both (which should have been part of the commit of the
> >> code) or was this left out on purpose?
> >>
> >> -Matthias
> >>
> >>
>
>


[jira] [Created] (FLINK-3326) Kafka Consumer doesn't retrieve messages after killing flink job which was using the same topic

2016-02-03 Thread Farouk Salem (JIRA)
Farouk Salem created FLINK-3326:
---

 Summary: Kafka Consumer doesn't retrieve messages after killing 
flink job which was using the same topic
 Key: FLINK-3326
 URL: https://issues.apache.org/jira/browse/FLINK-3326
 Project: Flink
  Issue Type: Bug
  Components: flink-contrib
Affects Versions: 0.10.1
 Environment: Java
Reporter: Farouk Salem
Priority: Blocker


If there is a streaming job which retrieving data from a Kafka topic (from the 
smallest offest) and this job is killed manually, starting another job (or the 
same job) after the killed job will not be able to consume data from the same 
topic starting from smallest offset.

I tried this behavior more than 10 times and each time, it failed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Connector Documentation missing

2016-02-03 Thread Maximilian Michels
Sorry, I don't think it was purposely not documented :) It needs to be
added. Could you file JIRAs and perhaps link them with the Flume/Nifi
JIRAs? You could also reopen those..

On Wed, Feb 3, 2016 at 3:21 PM, Matthias J. Sax  wrote:

> Thanks for clarification! What about the missing documentation?
>
> On 02/03/2016 03:18 PM, Maximilian Michels wrote:
> > There is currently only a FlumeSink. The FlumeSource is a dummy file
> > (copied from the AvroSource) and needs to be removed.
> >
> > On Wed, Feb 3, 2016 at 3:11 PM, Matthias J. Sax 
> wrote:
> >
> >> FlumeSink is there. FlumeSource and FlumeTopology is all put in
> >> comments... Not sure about it.
> >>
> >> There is no JIRA for a Flume connector... Does anyone know the status of
> >> Flume connector?
> >>
> >>
> >> On 02/03/2016 02:45 PM, Suneel Marthi wrote:
> >>> Most of the flume code is commented out IIRC?
> >>>
> >>> Sent from my iPhone
> >>>
>  On Feb 3, 2016, at 8:24 AM, Matthias J. Sax  wrote:
> 
>  Hi,
> 
>  I just observed that there are 7 flink-streaming-connectors available
>  but only 5 are documented on the web page.
> 
>  Flume and Nifi are not documented. Did we miss to extend the
>  documentation for both (which should have been part of the commit of
> the
>  code) or was this left out on purpose?
> 
>  -Matthias
> 
> 
> >>
> >>
> >
>
>


[jira] [Created] (FLINK-3323) Nifi connector not documented

2016-02-03 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-3323:
--

 Summary: Nifi connector not documented
 Key: FLINK-3323
 URL: https://issues.apache.org/jira/browse/FLINK-3323
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Matthias J. Sax
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Connector Documentation missing

2016-02-03 Thread Matthias J. Sax
FlumeSink is there. FlumeSource and FlumeTopology is all put in
comments... Not sure about it.

There is no JIRA for a Flume connector... Does anyone know the status of
Flume connector?


On 02/03/2016 02:45 PM, Suneel Marthi wrote:
> Most of the flume code is commented out IIRC?
> 
> Sent from my iPhone
> 
>> On Feb 3, 2016, at 8:24 AM, Matthias J. Sax  wrote:
>>
>> Hi,
>>
>> I just observed that there are 7 flink-streaming-connectors available
>> but only 5 are documented on the web page.
>>
>> Flume and Nifi are not documented. Did we miss to extend the
>> documentation for both (which should have been part of the commit of the
>> code) or was this left out on purpose?
>>
>> -Matthias
>>
>>



signature.asc
Description: OpenPGP digital signature


[jira] [Created] (FLINK-3325) Unclosed InputStream in CEPPatternOperator#restoreState()

2016-02-03 Thread Ted Yu (JIRA)
Ted Yu created FLINK-3325:
-

 Summary: Unclosed InputStream in CEPPatternOperator#restoreState()
 Key: FLINK-3325
 URL: https://issues.apache.org/jira/browse/FLINK-3325
 Project: Flink
  Issue Type: Bug
Reporter: Ted Yu
Priority: Minor


Here is related code:
{code}
final InputStream is = stream.getState(getUserCodeClassloader());
final ObjectInputStream ois = new ObjectInputStream(is);
final DataInputViewStreamWrapper div = new DataInputViewStreamWrapper(is);

nfa = (NFA)ois.readObject();
{code}
Neither is nor ois is closed upon return from the method



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3329) Unclosed BackupEngine in AbstractRocksDBState#snapshot()

2016-02-03 Thread Ted Yu (JIRA)
Ted Yu created FLINK-3329:
-

 Summary: Unclosed BackupEngine in AbstractRocksDBState#snapshot()
 Key: FLINK-3329
 URL: https://issues.apache.org/jira/browse/FLINK-3329
 Project: Flink
  Issue Type: Bug
Reporter: Ted Yu


Here is related code:
{code}
  BackupEngine backupEngine = BackupEngine.open(Env.getDefault(),
new BackupableDBOptions(localBackupPath.getAbsolutePath()));

  backupEngine.createNewBackup(db);
{code}
BackupEngine implements AutoCloseable.
backupEngine should be closed upon return from the method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3330) Add SparseVector support to BLAS library in FlinkML

2016-02-03 Thread Chiwan Park (JIRA)
Chiwan Park created FLINK-3330:
--

 Summary: Add SparseVector support to BLAS library in FlinkML
 Key: FLINK-3330
 URL: https://issues.apache.org/jira/browse/FLINK-3330
 Project: Flink
  Issue Type: Improvement
  Components: Machine Learning Library
Affects Versions: 1.0.0
Reporter: Chiwan Park
Assignee: Chiwan Park


An user reported the problem using {{GradientDescent}} algorithm with 
{{SparseVector}}. 
(http://mail-archives.apache.org/mod_mbox/flink-user/201602.mbox/%3CCAMJxVsiNRy_B349tuRpC%2BY%2BfyW7j2SHcyVfhqnz3BGOwEHXHpg%40mail.gmail.com%3E)

It seems lack of SparseVector support in {{BLAS.axpy}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Connector Documentation missing

2016-02-03 Thread Matthias J. Sax
Hi,

I just observed that there are 7 flink-streaming-connectors available
but only 5 are documented on the web page.

Flume and Nifi are not documented. Did we miss to extend the
documentation for both (which should have been part of the commit of the
code) or was this left out on purpose?

-Matthias




signature.asc
Description: OpenPGP digital signature


[jira] [Created] (FLINK-3327) Attach the ExecutionConfig to the JobGraph and make it accessible to the AbstractInvocable.

2016-02-03 Thread Kostas (JIRA)
Kostas created FLINK-3327:
-

 Summary: Attach the ExecutionConfig to the JobGraph and make it 
accessible to the AbstractInvocable.
 Key: FLINK-3327
 URL: https://issues.apache.org/jira/browse/FLINK-3327
 Project: Flink
  Issue Type: Sub-task
Reporter: Kostas
Assignee: Kostas






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Release 0.10.2

2016-02-03 Thread Ufuk Celebi

> On 02 Feb 2016, at 17:30, Robert Metzger  wrote:
> 
> Thank you for taking care of this. +1
> 
> I'd like to include this in to the 0.10.2 release:
> https://github.com/apache/flink/pull/1576

OK (merged to master and release-10)



[jira] [Created] (FLINK-3328) Incorrectly shaded dependencies in flink-runtime

2016-02-03 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-3328:
---

 Summary: Incorrectly shaded dependencies in flink-runtime
 Key: FLINK-3328
 URL: https://issues.apache.org/jira/browse/FLINK-3328
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 1.0.0
Reporter: Stephan Ewen
Priority: Blocker
 Fix For: 1.0.0


There are apparently some dependencies shaded into {{flink-runtime}} fat jar 
that are not relocated. (the flink-runtime jar is now 70 MB)

>From the output of the shading in flink-dist, it looks as if this concerns at 
>least
  - Zookeeper
  - slf4j
  - jline
  - netty (3.x)

Possible more.

{code}
[WARNING] zookeeper-3.4.6.jar, flink-runtime_2.10-1.0-SNAPSHOT.jar define 440 
overlapping classes: 
[WARNING]   - org.apache.zookeeper.server.NettyServerCnxnFactory
[WARNING]   - org.apache.jute.compiler.JFile
[WARNING]   - org.apache.zookeeper.server.SessionTracker$Session
[WARNING]   - org.apache.zookeeper.server.quorum.AuthFastLeaderElection$1
[WARNING]   - org.apache.jute.compiler.JLong
[WARNING]   - org.apache.zookeeper.client.ZooKeeperSaslClient$SaslState
[WARNING]   - org.apache.zookeeper.server.auth.KerberosName$Rule
[WARNING]   - org.apache.jute.CsvOutputArchive
[WARNING]   - org.apache.zookeeper.server.quorum.QuorumPeer
[WARNING]   - org.apache.zookeeper.ZooKeeper$DataWatchRegistration
[WARNING]   - 430 more...
[WARNING] slf4j-api-1.7.7.jar, flink-runtime_2.10-1.0-SNAPSHOT.jar define 24 
overlapping classes: 
[WARNING]   - org.slf4j.spi.MarkerFactoryBinder
[WARNING]   - org.slf4j.helpers.SubstituteLogger
[WARNING]   - org.slf4j.helpers.BasicMarker
[WARNING]   - org.slf4j.helpers.Util
[WARNING]   - org.slf4j.LoggerFactory
[WARNING]   - org.slf4j.Marker
[WARNING]   - org.slf4j.helpers.NamedLoggerBase
[WARNING]   - org.slf4j.Logger
[WARNING]   - org.slf4j.spi.LocationAwareLogger
[WARNING]   - org.slf4j.ILoggerFactory
[WARNING]   - 14 more...
[WARNING] jansi-1.4.jar, jline-2.10.4.jar define 23 overlapping classes: 
[WARNING]   - org.fusesource.jansi.Ansi$Erase
[WARNING]   - org.fusesource.jansi.Ansi
[WARNING]   - org.fusesource.jansi.AnsiOutputStream
[WARNING]   - org.fusesource.jansi.internal.CLibrary
[WARNING]   - org.fusesource.jansi.Ansi$2
[WARNING]   - org.fusesource.jansi.WindowsAnsiOutputStream
[WARNING]   - org.fusesource.jansi.AnsiRenderer$Code
[WARNING]   - org.fusesource.jansi.AnsiConsole
[WARNING]   - org.fusesource.jansi.Ansi$Attribute
[WARNING]   - org.fusesource.jansi.internal.Kernel32
[WARNING]   - 13 more...
[WARNING] commons-beanutils-core-1.8.0.jar, commons-collections-3.2.2.jar, 
commons-beanutils-1.7.0.jar define 10 overlapping classes: 
[WARNING]   - org.apache.commons.collections.FastHashMap$EntrySet
[WARNING]   - org.apache.commons.collections.ArrayStack
[WARNING]   - org.apache.commons.collections.FastHashMap$1
[WARNING]   - org.apache.commons.collections.FastHashMap$KeySet
[WARNING]   - org.apache.commons.collections.FastHashMap$CollectionView
[WARNING]   - org.apache.commons.collections.BufferUnderflowException
[WARNING]   - org.apache.commons.collections.Buffer
[WARNING]   - 
org.apache.commons.collections.FastHashMap$CollectionView$CollectionViewIterator
[WARNING]   - org.apache.commons.collections.FastHashMap$Values
[WARNING]   - org.apache.commons.collections.FastHashMap
[WARNING] flink-streaming-scala_2.10-1.0-SNAPSHOT.jar, 
flink-core-1.0-SNAPSHOT.jar, flink-runtime_2.10-1.0-SNAPSHOT.jar, 
flink-java-1.0-SNAPSHOT.jar, flink-streaming-java_2.10-1.0-SNAPSHOT.jar, 
flink-scala_2.10-1.0-SNAPSHOT.jar, flink-clients_2.10-1.0-SNAPSHOT.jar, 
flink-optimizer_2.10-1.0-SNAPSHOT.jar, flink-runtime-web_2.10-1.0-SNAPSHOT.jar 
define 1690 overlapping classes: 
[WARNING]   - 
org.apache.flink.shaded.com.google.common.collect.LinkedListMultimap
[WARNING]   - 
org.apache.flink.shaded.com.google.common.io.ByteSource$AsCharSource
[WARNING]   - org.apache.flink.shaded.com.google.common.escape.Platform
[WARNING]   - 
org.apache.flink.shaded.com.google.common.util.concurrent.Futures$ImmediateFailedCheckedFuture
[WARNING]   - 
org.apache.flink.shaded.com.google.common.primitives.SignedBytes$LexicographicalComparator
[WARNING]   - 
org.apache.flink.shaded.com.google.common.cache.LocalCache$WriteQueue$2
[WARNING]   - org.apache.flink.shaded.com.google.common.escape.Escaper$1
[WARNING]   - 
org.apache.flink.shaded.com.google.common.collect.MultimapBuilder$SetMultimapBuilder
[WARNING]   - 
org.apache.flink.shaded.com.google.common.collect.Ordering$ArbitraryOrdering
[WARNING]   - 
org.apache.flink.shaded.com.google.common.collect.Synchronized$SynchronizedAsMapEntries$1
[WARNING]   - 1680 more...
[WARNING] flink-scala_2.10-1.0-SNAPSHOT.jar, flink-java-1.0-SNAPSHOT.jar, 
flink-streaming-scala_2.10-1.0-SNAPSHOT.jar, 
flink-runtime_2.10-1.0-SNAPSHOT.jar define 25 overlapping classes: 
[WARNING]   - org.apache.flink.shaded.org.objectweb.asm.Context