Re: user mail list

2016-09-17 Thread Jay Juma
+1 for a separated user mail list.

It is convient for newbies like me. And most of the development discussion
are irrelevant to me.

- Jay

On Sun, Sep 18, 2016 at 10:12 AM, Khurrum Nasim 
wrote:

> Chris & Sijie,
>
>
> I felt it is worth separating the user and dev discussion into two mail
> lists. As from subscribers' perspective, I can easily setup filtering rules
> on two different mail lists and get all the information that I really care
> about.
>
> I'd prefer setting up a separated user list if we can.
>
> KN
>
> On Tue, Sep 13, 2016 at 1:07 PM, Chris Nauroth 
> wrote:
>
> > Typically podlings start out using the dev@ list to field user questions
> > and then split out a separate user@ list later, only if traffic from
> user
> > questions is sufficient to warrant the split.  There is no rule about
> this
> > though.  If your community prefers a separate user@ list even now, then
> > that’s fine, and mentors can help set that up.
> >
> > --Chris Nauroth
> >
> > On 9/12/16, 6:30 PM, "Sijie Guo"  wrote:
> >
> > Any suggestions from mentors? Maybe it is a good time to have a user
> > mail
> > list?
> >
> > - Sijie
> >
> > On Sat, Sep 10, 2016 at 11:21 PM, Sijie Guo 
> wrote:
> >
> > > Jay,
> > >
> > > Thank you for asking. Unfortunately we don't have a user mail list
> > yet.
> > >
> > > Since you are asking, it might be a good chance to have one. I need
> > to ask
> > > the podling mentors to see if we can create one.
> > >
> > > Thank you,
> > > Sijie
> > >
> > > On Sat, Sep 10, 2016 at 2:52 PM, Jay Juma 
> > wrote:
> > >
> > >> Hello,
> > >>
> > >> Is there a user mail list that I can join? I felt it is a bit
> weird
> > to ask
> > >> some simple user questions in a dev mail list.
> > >>
> > >> Thanks,
> > >> Jay
> > >>
> > >
> > >
> >
> >
> >
>


tutorial about setting up a global replicated log

2016-09-17 Thread Jay Juma
Hello,

Do you have a tutorial of setting up a global replicated log cluster?

Thanks,
Jay


Re: [Discuss] Transaction Support

2016-09-17 Thread Xi Liu
Khurrum, thank your for your comment.

A few use cases for your reference.

- we partitioned the data by keys (for example, user id). An operation may
update keys at the same time. We'd like the updates should only be seen by
the readers by only once. There is no partial failure state in between. A
single stream level transaction can not help in this case.
- imaging two stream computing job, the first one is reading from a set of
distributedlog streams and writing the computation results to the other set
of distributedlog streams. The other set of distributedlog streams are the
input for the second job. We use another distributedlog stream for tracking
the read dlsns for the first set of distributedlog stream and we want to
make sure updating the offset stream and propagating the computation
results into the streams of second job can be updated atomically.

Let me know your opinions. Appreciate your help.

- Xi

On Sun, Sep 18, 2016 at 10:25 AM, Khurrum Nasim 
wrote:

> Xi,
>
> The "stream level" transaction makes more sense to me. It is an extension
> of 'atomic write' records over the size limitation. I can see the value of
> having such ability.
>
> But I am not convinced by the namespace level transaction. Do you have any
> concrente use cases that you can talk about more?
>
> - KN
>
> On Mon, Sep 12, 2016 at 5:20 PM, Xi Liu  wrote:
>
> > Hello,
> >
> > I asked the transaction support in distributedlog user group two months
> > ago. I want to raise this up again, as we are looking for using
> > distributedlog for building a transactional data service. It is a major
> > feature that is missing in distributedlog. We have some ideas to add this
> > to distributedlog and want to know if they make sense or not. If they are
> > good, we'd like to contribute and develop with the community.
> >
> > Here are the thoughts:
> >
> > -
> >
> > From our understanding, DL can provide "at-least-once" delivery semantic
> > (if not, please correct me) but not "exactly-once" delivery semantic.
> That
> > means that a message can be delivered one or more times if the reader
> > doesn't handle duplicates.
> >
> > The duplicates come from two places, one is at writer side (this assumes
> > using write proxy not the core library), while the other one is at reader
> > side.
> >
> > - writer side: if the client attempts to write a record to the write
> > proxies and gets a network error (e.g timeouts) then retries, the
> retrying
> > will potentially result in duplicates.
> > - reader side:if the reader reads a message from a stream and then
> crashes,
> > when the reader restarts it would restart from last known position
> (DLSN).
> > If the reader fails after processing a record and before recording the
> > position, the processed record will be delivered again.
> >
> > The reader problem can be properly addressed by making use of the
> sequence
> > numbers of records and doing proper checkpointing. For example, in
> > database, it can checkpoint the indexed data with the sequence number of
> > records; in flink, it can checkpoint the state with the sequence numbers.
> >
> > The writer problem can be addressed by implementing an idempotent writer.
> > However, an alternative and more powerful approach is to support
> > transactions.
> >
> > *What does transaction mean?*
> >
> > A transaction means a collection of records can be written
> transactionally
> > within a stream or across multiple streams. They will be consumed by the
> > reader together when a transaction is committed, or will never be
> consumed
> > by the reader when the transaction is aborted.
> >
> > The transaction will expose following guarantees:
> >
> > - The reader should not be exposed to records written from uncommitted
> > transactions (mandatory)
> > - The reader should consume the records in the transaction commit order
> > rather than the record written order (mandatory)
> > - No duplicated records within a transaction (mandatory)
> > - Allow interleaving transactional writes and non-transactional writes
> > (optional)
> >
> > *Stream Transaction & Namespace Transaction*
> >
> > There will be two types of transaction, one is Stream level transaction
> > (local transaction), while the other one is Namespace level transaction
> > (global transaction).
> >
> > The stream level transaction is a transactional operation on writing
> > records to one stream; the namespace level transaction is a transactional
> > operation on writing records to multiple streams.
> >
> > *Implementation Thoughts*
> >
> > - A transaction is consist of begin control record, a series of data
> > records and commit/abort control record.
> > - The begin/commit/abort control record is written to a `commit` log
> > stream, while the data records will be written to normal data log
> streams.
> > - The `commit` log stream will be the same log stream for stream-level
> > transaction,  

Re: Repackaging the package namespace

2016-09-17 Thread Khurrum Nasim
+1

On Mon, Sep 12, 2016 at 6:16 PM, Stevo Slavić  wrote:

> + 1 (non-binding) for doing it asap
>
> On Mon, Sep 12, 2016 at 11:40 AM, Flavio Junqueira  wrote:
>
> > It isn't strictly necessary, here is what the documentation says:
> >
> > On Repackaging
> >
> > It is recommended - but not mandated - that source is repackaged under
> the
> > Apache namespace. There is no need to use the incubator namespace. For
> > example, Java source might be repackaged to org.apache.foo.Bar or a DTD
> to
> > http://dtd.apache.org/foo/bar.
> >
> > Existing open source projects moving to Apache may well need to consider
> > carefully how they will approach this transition.
> >
> >
> >
> > My suggestion is to repackage before the first release, though. This way
> > applications written against the early release will have the right
> > namespace already.
> >
> > -Flavio
> >
> > > On 10 Sep 2016, at 20:32, Sijie Guo  wrote:
> > >
> > > Any suggestions from mentors for the first release?
> > >
> > > - Sijie
> > >
> > > On Tue, Aug 30, 2016 at 11:18 PM, Sijie Guo  wrote:
> > >
> > >> Does anyone know if it is required to repackage the namespace to under
> > >> org.apache for the first release? Any suggestions from mentors?
> > >>
> > >> - Sijie
> > >>
> >
> >
>


Re: question about DL namespace

2016-09-17 Thread Khurrum Nasim
+1 for the interface.

- KN

On Mon, Sep 12, 2016 at 5:46 PM, Jon Derrick 
wrote:

> Sijie, thank you for your comments.
>
> I'd like to make a proposal by introducing a `NamespaceResolver`.
>
> What does a namespace resolver do? A namespace resolver is basically
> resolving the log stream name into a metadata location path. Then DL knows
> where to locate the metadata of a log stream. The resolver also takes the
> responsibility of validating the stream name and managing the hierarchical
> of streams.
>
> A NamespaceResolver interface will look like as below:
>
> public interface NamespaceResolver {
>
> /** validate if the stream name is okay */
>
> boolean validateStreamName(String streamName);
>
> /** resolve the stream name into the location path of the metadata */
>
> String resolveStreamPath(String streamName);
>
>
> }
>
> So a filesystem-like namespace resolver will only accept the absolute
> file-like paths as the stream names and a kafka-like (what Khurrum
> mentioned) namespace resolver will probably accept names like
> '/'.
>
> A namespace resolver will be added to the namespace metadata binding and
> loaded via reflection.
>
> Any thoughts? I will send out a pull request soon.
>
> - jd
>
>
> On Tue, Aug 23, 2016 at 9:07 AM, Khurrum Nasim 
> wrote:
>
> > On Thu, Aug 18, 2016 at 2:30 AM, Sijie Guo  wrote:
> >
> > > Jon,
> > >
> > > Sorry for late response. This is a very good question. Comments in
> line.
> > >
> > > Sijie
> > >
> > > On Monday, August 15, 2016, Jon Derrick 
> > > wrote:
> > >
> > > > Hello all,
> > > >
> > > > I read the distributed log code closely. I found that the DL
> namespace
> > > is a
> > > > flat namespace. There will be a potential issue if there are a lot of
> > > > streams created under a same namespace. I am very curious what are
> the
> > > > thoughts behind that. Here are some questions:
> > > >
> > > > - How many streams that a namespace can support?
> > >
> > >
> > > The maximum number of streams we have had for a single namespace is
> more
> > > than 30k. But yup, you are right. It is limited by the number of
> children
> > > that a znode can have.
> > >
> > > >
> > > >
> > > > It seems to be bound with
> > > > the limitation on the number of children that a zookeeper znode can
> > have.
> > > > What's the maximum number of logs do you guys have?
> > > > - Why not choose a tree representation? Then it might be easier to
> > > organize
> > > > streams. For example, if I want to use multiple dl streams as
> > > partitions, I
> > > > can just easily organize them together under same znode.
> > >
> > >
> > > We don't want to DL to focus on partitions. We let applications decide
> > how
> > > to partition. So we choose a simple way to start. However, I don't
> think
> > it
> > > is necessary to be just a flat namespace. You probably already noticed
> > that
> > > there is another namespace implementation to support hierarchy.
> > >
> > > If you do like to support filesystem like namespace, I would suggest
> > adding
> > > a namespace type on metadata binding. So it can support different types
> > of
> > > namespaces. Does that meet your requirements?
> > >
> >
> > +1 for supporting different types of namespaces. I want to organize a
> kafka
> > topic in following format:
> >
> > namespace/topic/partitions : storing all the partitions
> > namespace/topic/partitions/N : storing the given partition `N`
> > namespace/topic/subscriptions : storing all the subscriptions
> > namespace/topic/subscriptions/S : storing the information of
> subscription
> > `S`
> >
> > both `namespace/topic/partitions/N` and `namespace/topic/
> subscriptions/S`
> > are DL streams.
> >
> > So it would make me easier to manage the streams if I can customize
> > namespace layout.
> >
> > - KN
> >
> >
> > >
> > >
> > > > - Also if it is a tree-like namespace, it might be easier to
> implement
> > a
> > > > filesystem over the streams. Each file can be backed by one dl
> stream.
> > In
> > > > that way, I can also use DL as long term storage.
> > > >
> > > > Any thoughts? Appreciate your comments.
> > > >
> > > >
> > > > --
> > > > - jderrick
> > > >
> > >
> >
>
>
>
> --
> - jderrick
>


Re: Website

2016-09-17 Thread Khurrum Nasim
Awesome. The website looks good.

Sijie, it would be good if you can add a documentation on how to build the
website and the documentation.

- KN

On Tue, Sep 13, 2016 at 5:29 PM, Sijie Guo  wrote:

> We shipped the website and it is live under
> http://distributedlog.incubator.apache.org/
>
> The documentation has been re-organized into three major parts, one is
> 'start' - including the pages to start, one is 'user-guide' - include pages
> that users need to understand the architecture and use it, the remaining
> one is 'admin-guide' - including the pages on how to operate the cluster
> and such.
>
> The documentation is still not good enough. Please help us improve the
> website if you find any issues. Also I created two master tickets to
> tracking any improvements related to user-guide
>  and admin-guide
> .
>
> - Sijie
>
>
> On Fri, Sep 9, 2016 at 12:41 AM, Sijie Guo  wrote:
>
> > I played with jekyll-rst plugin to build the documentation and pushed the
> > latest built website to my github :
> >
> > https://sijie.github.io/incubator-distributedlog/
> >
> > The website contains two parts:
> >
> > - one is the content under directory website. it contains the information
> > that probably will not be changed between releases, like community and
> > developer information.
> > - one is the content under directory docs. it contains all the
> > documentation that is aligned with each release.
> >
> > The build.sh script under website will link the current docs as `latest`
> > (we can add stable release once we come to a release) and build the whole
> > site.
> >
> > Let me know if the structure looks good. If it is okay, we can push the
> > website and iterate from there.
> >
> > - Sijie
> >
> >
> >
> > On Mon, Sep 5, 2016 at 6:02 PM, Sijie Guo  wrote:
> >
> >> I pushed old content to asf-site branch to current repo and enabled
> >> gitpubsub. So the content of distributedlog.io is alive at
> >> distributedlog.incubator.apache.org now.
> >>
> >> The new site will come up soon.
> >>
> >> Sijie
> >>
> >>
> >> On Thursday, August 25, 2016, Sijie Guo  wrote:
> >>
> >>> Based on the post https://blogs.apache.org/infra
> >>> /entry/git_based_websites_available, we can enable 'gitpubsub' on an
> >>> asf git repo, so it will pull the content either under root directory
> or
> >>> `content` directory from *asf-site* branch of that repo.
> >>>
> >>> I checked other asf projects. I found there are two approaches to do
> >>> that.
> >>>
> >>> 1) use a separated repo for storing the content of website. so they
> will
> >>> be two repo, one is `project` while the other one is typically
> >>> `project`-site or `project`-web.
> >>> 2) use a single repo and just put the built static content into the
> *asf-site
> >>> *branch.
> >>>
> >>> I am kind of leaning toward 2). since I'd like to put documentation and
> >>> code together in a single repo. so it is good to make sure whenever
> there
> >>> is code change, the documentation should be updated and reflected. we
> can
> >>> probably write a script to build the website and push the built static
> >>> content to *asf-site *branch.
> >>>
> >>> Any thoughts?
> >>>
> >>> - Sijie
> >>>
> >>>
> >>>
> >>> On Tue, Aug 23, 2016 at 10:00 PM, Sijie Guo  wrote:
> >>>
>  Hi all,
> 
>  I put up a website following other apache project to use jekyll and
>  bootstrap.
> 
>  the demo is here https://sijie.github.io/incubator-distributedlog/
>  and the git pull request: https://github.com/ap
>  ache/incubator-distributedlog/pull/13
> 
>  most of the links are pointed to http://distributedlog.io/ directly
>  for now. we can try to use the jekyll-rst plugin
>   to compile existing rst
> files
>  under doc to static files.
> 
>  Please take a look and let me know if it is okay.
> 
>  Also, I need to investigate how Apache can host the website from a git
>  repo. If anyone knows how to do it, please let me know.
> 
>  - Sijie
> 
> 
> >>>
> >
>


Re: Wrong email list to sent Git and JIRA updates?

2016-09-17 Thread Sijie Guo
No objection from me. Thank you, Henry.

- Sijie

On Sat, Sep 17, 2016 at 3:01 PM, Henry Saputra 
wrote:

> The JIRA issues for infra are these:
>
> https://issues.apache.org/jira/browse/INFRA-12243
> https://issues.apache.org/jira/browse/INFRA-12245
>
> It didnt specify the target list so I assume it was a mistake from infra.
>
> If no objection, I will create JIRA to infra to update target list to dev@
> instead of commit@ which only for commit messages.
>
> - Henry
>
> On Fri, Sep 16, 2016 at 11:35 PM, Henry Saputra 
> wrote:
>
> > Hi Guys,
> >
> > I just realized that all Git and Github updates, and also for JIRA are
> > sent to commits@ list instead of dev@ list. This is not intentional, is
> > it?
> >
> > Who does the configuration for Github integration and JIRA space request?
> >
> > Thanks,
> >
> > - Henry
> >
>


Wrong email list to sent Git and JIRA updates?

2016-09-17 Thread Henry Saputra
Hi Guys,

I just realized that all Git and Github updates, and also for JIRA are sent
to commits@ list instead of dev@ list. This is not intentional, is it?

Who does the configuration for Github integration and JIRA space request?

Thanks,

- Henry