Re: example applications in malhar

2017-03-12 Thread Thomas Weise
I think a flat list with good naming would make it easier to find a
matching example.

topnwords ? There are already two related examples in Malhar: wordcount and
twitter

Can some of the examples be consolidated into a single example package,
fileioXXX/fileoutput look like candidates.


On Sat, Mar 11, 2017 at 6:36 PM, Lakshmi Velineni 
wrote:

> Hi,
>
> I have come up with the following list of applications to move from
> datatorrent/examples into apex-malhar/examples. There were many examples
> that do a similar functionality of moving data around from different
> sources to destinations, so I separated them all into a separate folder
> called "Ingestion".
>
> csvformatter
>
> dedup
>
> dynamic-partition
>
> enricher
>
> exactly-once
>
> filter
>
> innerjoin
>
> maprapp
>
> parser
>
> partition
>
> recordreader
>
> throttle
>
> topnwords
>
> transform
>
> unifiers
>
> *Ingestion/*
>
>   cassandrainput
>
>   cassandraoutput
>
>   fileio
>
>   fileio-multidir
>
>   fileio-simple
>
>   fileoutput
>
>   filetojdbc
>
>   hdfs2kafka
>
>   hdfs-sync
>
>   jdbcingest
>
>   jdbctijdbc
>
>   jmsActiveMQ
>
>   jmsSqs
>
>   kafka
>
>   s3-to-hdfs-sync
>
>   s3output
>
>
> Let me know your thoughts.
>
>
> Thanks
>
> Lakshmi Prasanna
>
>
>
>
>
>
>
> On Thu, Feb 23, 2017 at 10:34 AM, Amol Kekre  wrote:
>
> > Yes, should merge samples into examples. Ideally the names of the
> examples
> > should be more descriptive in terms of "how to" as opposed to a title. PI
> > demo for example has lots of ways to do compute. So it could be named as
> > "pi - distributed compute" in examples. Similarly if others can be named
> to
> > bring out features to look at from the examples, it would make examples
> > more useful to readers.
> >
> > Thks
> > Amol
> >
> >
> >
> > E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*
> >
> > www.datatorrent.com  |  apex.apache.org
> >
> > *Join us at Apex Big Data World-San Jose
> > , April 4, 2017!*
> > [image: http://www.apexbigdata.com/san-jose-register.html]
> > 
> >
> > On Thu, Feb 23, 2017 at 10:07 AM, Sanjay Pujare 
> > wrote:
> >
> > > + for renaming to examples. While we are at it, how about merging
> > "samples"
> > > also in the new "examples" ?
> > >
> > > On Thu, Feb 23, 2017 at 9:47 AM, Munagala Ramanath <
> r...@datatorrent.com>
> > > wrote:
> > >
> > > > +1 for renaming to "examples"
> > > >
> > > > Ram
> > > >
> > > > On Thu, Feb 23, 2017 at 9:12 AM, Lakshmi Velineni <
> > > laks...@datatorrent.com
> > > > >
> > > > wrote:
> > > >
> > > > > I am ready to bring the examples over into the demos folder. I was
> > > > > wondering if anybody has any input on Thomas's suggestion to rename
> > the
> > > > > demos folder to examples. I would rather do that first and then
> bring
> > > the
> > > > > examples over instead of doing it the other way around as that
> would
> > > lead
> > > > > to refactoring the new examples again.
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Wed, Jan 25, 2017 at 8:12 AM, Lakshmi Velineni <
> > > > laks...@datatorrent.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Since the examples have little history I was planning to have two
> > > > > > commits for every example, one for the code as the primary author
> > of
> > > > > > the example and another containing pom.xml and other changes to
> > make
> > > > > > it work under malhar.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > On Wed, Nov 2, 2016 at 9:49 PM, Lakshmi Velineni
> > > > > >  wrote:
> > > > > > > Thanks for the suggestions and I am working on the process to
> > > migrate
> > > > > the
> > > > > > > examples with the guidelines you mentioned. I will send out a
> > list
> > > of
> > > > > > > examples and the destination modules very soon.
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Oct 27, 2016 at 1:43 PM, Thomas Weise <
> > > > thomas.we...@gmail.com>
> > > > > > > wrote:
> > > > > > >>
> > > > > > >> Maybe a good first step would be to identify which examples to
> > > bring
> > > > > > over
> > > > > > >> and where appropriate how to structure them in Malhar (for
> > > example,
> > > > I
> > > > > > see
> > > > > > >> multiple hdfs related apps that could go into the same Maven
> > > > module).
> > > > > > >>
> > > > > > >>
> > > > > > >> On Tue, Oct 25, 2016 at 1:00 PM, Thomas Weise  >
> > > > wrote:
> > > > > > >>
> > > > > > >> > That would be great. There are a few things to consider when
> > > > working
> > > > > > on
> > > > > > >> > it:
> > > > > > >> >
> > > > > > >> > * preserve attribtion
> > > > > > >> > * ensure there is a test that runs the application in the CI
> > > > > > >> > * check that dependencies are compatible license
> > > > > > >> > * maybe extract common boilerplate code from pom.xml
> > > > > > >> >

[jira] [Resolved] (APEXMALHAR-2130) Scalable windowed storage

2017-03-12 Thread Thomas Weise (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise resolved APEXMALHAR-2130.
--
Resolution: Implemented

> Scalable windowed storage
> -
>
> Key: APEXMALHAR-2130
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2130
> Project: Apache Apex Malhar
>  Issue Type: Task
>Reporter: bright chen
>Assignee: David Yan
>  Labels: roadmap
> Fix For: 3.7.0
>
>
> This feature is used for supporting windowing.
> The storage needs to have the following features:
> 1. Spillable key value storage (integrate with APEXMALHAR-2026)
> 2. Upon checkpoint, it saves a snapshot for the entire data set with the 
> checkpointing window id.  This should be done incrementally (ManagedState) to 
> avoid wasting space with unchanged data
> 3. When recovering, it takes the recovery window id and restores to that 
> snapshot
> 4. When a window is committed, all windows with a lower ID should be purged 
> from the store.
> 5. It should implement the WindowedStorage and WindowedKeyedStorage 
> interfaces, and because of 2 and 3, we may want to add methods to the 
> WindowedStorage interface so that the implementation of WindowedOperator can 
> notify the storage of checkpointing, recovering and committing of a window.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (APEXMALHAR-2271) Special SpillableSetMultimap<K, Window.SessionWindow> handling in ManagedState in SpillableSessionWindowedStorage

2017-03-12 Thread Thomas Weise (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise updated APEXMALHAR-2271:
-
Issue Type: Improvement  (was: Sub-task)
Parent: (was: APEXMALHAR-2130)

> Special SpillableSetMultimap handling in 
> ManagedState in SpillableSessionWindowedStorage
> 
>
> Key: APEXMALHAR-2271
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2271
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Reporter: David Yan
>Assignee: Siyuan Hua
>
> The problem with this data structure is that the key is not time based.
> This poses a problem because we will have a problem purging.
> One idea is to have `put` write the key and the window set to the timebucket 
> that the latest window in that set belongs to, which means possible 
> duplication of the key and the window set, and make `get` do a for loop on 
> all time buckets to find the key. That way, purging based on the time bucket 
> can still be done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (APEXMALHAR-2203) Control tuple port and watermark support in high-level API (version 1)

2017-03-12 Thread Thomas Weise (JIRA)

 [ 
https://issues.apache.org/jira/browse/APEXMALHAR-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise updated APEXMALHAR-2203:
-
Fix Version/s: (was: 3.7.0)

> Control tuple port and watermark support in high-level API (version 1)
> --
>
> Key: APEXMALHAR-2203
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2203
> Project: Apache Apex Malhar
>  Issue Type: Improvement
>Affects Versions: 3.6.0
>Reporter: Siyuan Hua
>Assignee: Siyuan Hua
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)