[GitHub] storm pull request #1756: STORM-1278: Port org.apache.storm.daemon.worker to...

2016-11-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/1756


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1775: STORM-1278: Port org.apache.storm.daemon.worker to...

2016-11-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/1775


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1410: STORM-1778: scheme extension framework for KafkaSourcePro...

2016-11-15 Thread vesense
Github user vesense commented on the issue:

https://github.com/apache/storm/pull/1410
  
@chuanlei I think this issue has been fixed by STORM-2173. Can you close 
this?
BTW, the` CsvScheme` in STORM-2173 uses standard RFC4180 CSV parser 
supporting strings like `a,"b,c",d`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] Release Storm 1.1.0

2016-11-15 Thread P. Taylor Goetz
Thanks Xin, I added it to the 1.1.0 epic.

-Taylor

> On Nov 15, 2016, at 9:01 AM, Xin Wang  wrote:
> 
> STORM-2198 ( PR: https://github.com/apache/storm/pull/1773 ) fixes a bug of
> storm-hdfs. Do we have a consideration to include this?
> 
> Thanks,
> Xin Wang (vesense)
> 
> 2016-11-15 10:03 GMT+08:00 Jungtaek Lim :
> 
>> Some issues on Storm SQL are resolved but not documented yet. I'll file an
>> issue and assign to 1.1.0 release epic.
>> And also I want to address dropping aggregation and join on Storm SQL
>> Trident mode before releasing. I'll assign it too.
>> 
>> - Jungtaek Lim (HeartSaVioR)
>> 
>> 
>> 2016년 11월 15일 (화) 오전 5:55, P. Taylor Goetz 님이 작성:
>> 
>>> I think we’re very close. I would like to confirm that the 1.x-branch is
>>> not affected by STORM-2176.
>>> 
>>> The worker lifecycle API was added in 1.0, but doesn’t work in any
>>> released version due to STORM-2176.
>>> 
>>> If there are any other open JIRAs that anyone is passionate about, now
>>> would be a good time to assign them to the 1.1.0 release epic
>> (STORM-1856).
>>> 
>>> -Taylor
>>> 
>>> 
>>> 
 On Oct 27, 2016, at 12:19 PM, Jungtaek Lim  wrote:
 
 Finally Pacemaker H/A, Supervisor V2, and Storm SQL PRs which were
>> opened
 at the last mail (4 weeks ago) are all merged to 1.x branch.
 
 There're some more PRs on Storm SQL opened, but given that we can
>> release
 new minor at any time when we feel it's enough change, I can wait for
>> it.
 They didn't get reviewed yet indeed.
 
 Is there something else we would want to include it to 1.1.0?
 
 Thanks,
 Jungtaek Lim (HeartSaVioR)
 
 2016년 10월 1일 (토) 오전 9:30, Jungtaek Lim 님이 작성:
 
> Personally, merging and porting back to three branches are painful
>>> enough,
> especially we don't have merging script and having verbose process (I
>>> mean
> CHANGELOG).
> It would be better if merging process is automated (by running script
>> or
> so), so I'd +1 to revisit Harsha's suggestion (adopting Kafka merge
>>> script)
> and modify script to fit to Storm.
> (It will not work if it's the case we need to handle PRs for each
>>> version
> line, since 'Close' in commit log doesn't close the PR if its target
>>> branch
> is not master.)
> 
> Anyway, without automation I don't want to maintain more version
>> lines.
> I'm looking at the announces from other projects, and others are only
> maintaining two version lines.
> Since we maintain 2.0.0 version line we can't reduce version lines to
>> 2,
> but hopefully at most 3.
> 
> Btw, let's check pending pull requests and enumerate which can be
>>> included
> in 1.0.0, and start/finish review and merge them soon.
> For me Supervisor V2 and Pacemaker H/A, and pending Storm SQL PRs can
>> be
> included, since they are small or in reviewing and expected to pass
>>> review
> phase soon.
> (And some small PRs. There're other valuable PRs in PR list but I'm
>> not
> sure we can review them soon. One example is unified API.)
> 
> One issue which is not clear is STORM-2006
> . This is a candidate for
>>> me,
> but gets blocked while reviewing. If we plan to put great effort to
>>> revise
> Metric we can skip this.
> 
> Please enumerate other PRs as well if you want to include in 1.1.0.
> 
> Thanks,
> Jungtaek Lim
> 
> 2016년 9월 30일 (금) 오후 11:09, Bobby Evans 
>> 님이
>>> 작성:
> 
> Sounds good to me.  It would be nice to get some of the new features
>>> out.
> Do we expect to maintain both 1.0.x and 1.1.x lines with bug fixes?
>>> And if
> so for how long do we want to do this for? - Bobby
> 
>   On Thursday, September 29, 2016 7:35 PM, Jungtaek Lim <
> kabh...@gmail.com> wrote:
> 
> 
> Hi devs,
> 
> It's been 5 months after releasing Storm 1.0.0, and now 1.x branch has
>>> lots
> of CHANGELOG and also pending reviews.
> It's also been a long time after 1.1.0 RC1 is canceled.
> 
> I think it may be good to put some efforts to review and merge pending
>>> pull
> requests (except things which takes time to review and test), and
>>> release
> 1.1.0 soon.
> 
> What do you think?
> 
> I'm also open to volunteer release manager for 1.1.0 after we document
>>> the
> process of official release.
> 
> Thanks,
> Jungtaek Lim (HeartSaVioR)
> 
> 
> 
> 
> 
>>> 
>>> 
>> 



[GitHub] storm pull request #1766: [STORM-2192] Add a new IAutoCredentials plugin to ...

2016-11-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/1766


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] Release Storm 1.1.0

2016-11-15 Thread Aaron Niskodé-Dossett
I would be +1 for including it

On Tue, Nov 15, 2016 at 8:01 AM Xin Wang  wrote:

STORM-2198 ( PR: https://github.com/apache/storm/pull/1773 ) fixes a bug of
storm-hdfs. Do we have a consideration to include this?

Thanks,
Xin Wang (vesense)

2016-11-15 10:03 GMT+08:00 Jungtaek Lim :

> Some issues on Storm SQL are resolved but not documented yet. I'll file an
> issue and assign to 1.1.0 release epic.
> And also I want to address dropping aggregation and join on Storm SQL
> Trident mode before releasing. I'll assign it too.
>
> - Jungtaek Lim (HeartSaVioR)
>
>
> 2016년 11월 15일 (화) 오전 5:55, P. Taylor Goetz 님이 작성:
>
> > I think we’re very close. I would like to confirm that the 1.x-branch is
> > not affected by STORM-2176.
> >
> > The worker lifecycle API was added in 1.0, but doesn’t work in any
> > released version due to STORM-2176.
> >
> > If there are any other open JIRAs that anyone is passionate about, now
> > would be a good time to assign them to the 1.1.0 release epic
> (STORM-1856).
> >
> > -Taylor
> >
> >
> >
> > > On Oct 27, 2016, at 12:19 PM, Jungtaek Lim  wrote:
> > >
> > > Finally Pacemaker H/A, Supervisor V2, and Storm SQL PRs which were
> opened
> > > at the last mail (4 weeks ago) are all merged to 1.x branch.
> > >
> > > There're some more PRs on Storm SQL opened, but given that we can
> release
> > > new minor at any time when we feel it's enough change, I can wait for
> it.
> > > They didn't get reviewed yet indeed.
> > >
> > > Is there something else we would want to include it to 1.1.0?
> > >
> > > Thanks,
> > > Jungtaek Lim (HeartSaVioR)
> > >
> > > 2016년 10월 1일 (토) 오전 9:30, Jungtaek Lim 님이 작성:
> > >
> > >> Personally, merging and porting back to three branches are painful
> > enough,
> > >> especially we don't have merging script and having verbose process (I
> > mean
> > >> CHANGELOG).
> > >> It would be better if merging process is automated (by running script
> or
> > >> so), so I'd +1 to revisit Harsha's suggestion (adopting Kafka merge
> > script)
> > >> and modify script to fit to Storm.
> > >> (It will not work if it's the case we need to handle PRs for each
> > version
> > >> line, since 'Close' in commit log doesn't close the PR if its target
> > branch
> > >> is not master.)
> > >>
> > >> Anyway, without automation I don't want to maintain more version
> lines.
> > >> I'm looking at the announces from other projects, and others are only
> > >> maintaining two version lines.
> > >> Since we maintain 2.0.0 version line we can't reduce version lines to
> 2,
> > >> but hopefully at most 3.
> > >>
> > >> Btw, let's check pending pull requests and enumerate which can be
> > included
> > >> in 1.0.0, and start/finish review and merge them soon.
> > >> For me Supervisor V2 and Pacemaker H/A, and pending Storm SQL PRs can
> be
> > >> included, since they are small or in reviewing and expected to pass
> > review
> > >> phase soon.
> > >> (And some small PRs. There're other valuable PRs in PR list but I'm
> not
> > >> sure we can review them soon. One example is unified API.)
> > >>
> > >> One issue which is not clear is STORM-2006
> > >> . This is a candidate for
> > me,
> > >> but gets blocked while reviewing. If we plan to put great effort to
> > revise
> > >> Metric we can skip this.
> > >>
> > >> Please enumerate other PRs as well if you want to include in 1.1.0.
> > >>
> > >> Thanks,
> > >> Jungtaek Lim
> > >>
> > >> 2016년 9월 30일 (금) 오후 11:09, Bobby Evans 
> 님이
> > 작성:
> > >>
> > >> Sounds good to me.  It would be nice to get some of the new features
> > out.
> > >> Do we expect to maintain both 1.0.x and 1.1.x lines with bug fixes?
> > And if
> > >> so for how long do we want to do this for? - Bobby
> > >>
> > >>On Thursday, September 29, 2016 7:35 PM, Jungtaek Lim <
> > >> kabh...@gmail.com> wrote:
> > >>
> > >>
> > >> Hi devs,
> > >>
> > >> It's been 5 months after releasing Storm 1.0.0, and now 1.x branch
has
> > lots
> > >> of CHANGELOG and also pending reviews.
> > >> It's also been a long time after 1.1.0 RC1 is canceled.
> > >>
> > >> I think it may be good to put some efforts to review and merge
pending
> > pull
> > >> requests (except things which takes time to review and test), and
> > release
> > >> 1.1.0 soon.
> > >>
> > >> What do you think?
> > >>
> > >> I'm also open to volunteer release manager for 1.1.0 after we
document
> > the
> > >> process of official release.
> > >>
> > >> Thanks,
> > >> Jungtaek Lim (HeartSaVioR)
> > >>
> > >>
> > >>
> > >>
> > >>
> >
> >
>


[GitHub] storm pull request #1779: STORM-1694: Kafka Spout Trident Implementation Usi...

2016-11-15 Thread hmcl
GitHub user hmcl opened a pull request:

https://github.com/apache/storm/pull/1779

STORM-1694: Kafka Spout Trident Implementation Using New Kafka Consumer API

@harshach I refactored the patch merged onto in order to compile. The 
changes are to remove the use of lambdas and to fix some generics compile type 
resolution.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hmcl/storm-apache 
1.x-branch_STORM-1694_and_STORM-2182

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/1779.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1779


commit 27970e79556e7ffd861e968c08293e63383609d1
Author: Hugo Louro 
Date:   2016-06-21T16:35:16Z

STORM-1694: Kafka Spout Trident Implementation Using New Kafka Consumer API
 - Kafka New Client - Opaque Transactional Trident Spout Implementation
 - Implementation supporting multiple named topics and wildcard topics

commit a680ec0a7df13525beb033806732b73c0e6620a3
Author: Hugo Louro 
Date:   2016-10-25T22:48:21Z

STORM-2182: Refactor Storm Kafka Examples Into Own Modules
  - Created modules
   - storm-kafka-examples
   - storm-kafka-client-examples




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1773: STORM-2198: perform RotationAction when stopping HdfsBolt

2016-11-15 Thread dossett
Github user dossett commented on the issue:

https://github.com/apache/storm/pull/1773
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1746: STORM-1607: Add MongoMapState for supporting trident's ex...

2016-11-15 Thread vesense
Github user vesense commented on the issue:

https://github.com/apache/storm/pull/1746
  
Any comments are welcome.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1778: [STORM-2082][SQL] add sql external module storm-sql-hdfs

2016-11-15 Thread vesense
Github user vesense commented on the issue:

https://github.com/apache/storm/pull/1778
  
@HeartSaVioR Addressed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] Release Storm 1.1.0

2016-11-15 Thread Xin Wang
STORM-2198 ( PR: https://github.com/apache/storm/pull/1773 ) fixes a bug of
storm-hdfs. Do we have a consideration to include this?

Thanks,
Xin Wang (vesense)

2016-11-15 10:03 GMT+08:00 Jungtaek Lim :

> Some issues on Storm SQL are resolved but not documented yet. I'll file an
> issue and assign to 1.1.0 release epic.
> And also I want to address dropping aggregation and join on Storm SQL
> Trident mode before releasing. I'll assign it too.
>
> - Jungtaek Lim (HeartSaVioR)
>
>
> 2016년 11월 15일 (화) 오전 5:55, P. Taylor Goetz 님이 작성:
>
> > I think we’re very close. I would like to confirm that the 1.x-branch is
> > not affected by STORM-2176.
> >
> > The worker lifecycle API was added in 1.0, but doesn’t work in any
> > released version due to STORM-2176.
> >
> > If there are any other open JIRAs that anyone is passionate about, now
> > would be a good time to assign them to the 1.1.0 release epic
> (STORM-1856).
> >
> > -Taylor
> >
> >
> >
> > > On Oct 27, 2016, at 12:19 PM, Jungtaek Lim  wrote:
> > >
> > > Finally Pacemaker H/A, Supervisor V2, and Storm SQL PRs which were
> opened
> > > at the last mail (4 weeks ago) are all merged to 1.x branch.
> > >
> > > There're some more PRs on Storm SQL opened, but given that we can
> release
> > > new minor at any time when we feel it's enough change, I can wait for
> it.
> > > They didn't get reviewed yet indeed.
> > >
> > > Is there something else we would want to include it to 1.1.0?
> > >
> > > Thanks,
> > > Jungtaek Lim (HeartSaVioR)
> > >
> > > 2016년 10월 1일 (토) 오전 9:30, Jungtaek Lim 님이 작성:
> > >
> > >> Personally, merging and porting back to three branches are painful
> > enough,
> > >> especially we don't have merging script and having verbose process (I
> > mean
> > >> CHANGELOG).
> > >> It would be better if merging process is automated (by running script
> or
> > >> so), so I'd +1 to revisit Harsha's suggestion (adopting Kafka merge
> > script)
> > >> and modify script to fit to Storm.
> > >> (It will not work if it's the case we need to handle PRs for each
> > version
> > >> line, since 'Close' in commit log doesn't close the PR if its target
> > branch
> > >> is not master.)
> > >>
> > >> Anyway, without automation I don't want to maintain more version
> lines.
> > >> I'm looking at the announces from other projects, and others are only
> > >> maintaining two version lines.
> > >> Since we maintain 2.0.0 version line we can't reduce version lines to
> 2,
> > >> but hopefully at most 3.
> > >>
> > >> Btw, let's check pending pull requests and enumerate which can be
> > included
> > >> in 1.0.0, and start/finish review and merge them soon.
> > >> For me Supervisor V2 and Pacemaker H/A, and pending Storm SQL PRs can
> be
> > >> included, since they are small or in reviewing and expected to pass
> > review
> > >> phase soon.
> > >> (And some small PRs. There're other valuable PRs in PR list but I'm
> not
> > >> sure we can review them soon. One example is unified API.)
> > >>
> > >> One issue which is not clear is STORM-2006
> > >> . This is a candidate for
> > me,
> > >> but gets blocked while reviewing. If we plan to put great effort to
> > revise
> > >> Metric we can skip this.
> > >>
> > >> Please enumerate other PRs as well if you want to include in 1.1.0.
> > >>
> > >> Thanks,
> > >> Jungtaek Lim
> > >>
> > >> 2016년 9월 30일 (금) 오후 11:09, Bobby Evans 
> 님이
> > 작성:
> > >>
> > >> Sounds good to me.  It would be nice to get some of the new features
> > out.
> > >> Do we expect to maintain both 1.0.x and 1.1.x lines with bug fixes?
> > And if
> > >> so for how long do we want to do this for? - Bobby
> > >>
> > >>On Thursday, September 29, 2016 7:35 PM, Jungtaek Lim <
> > >> kabh...@gmail.com> wrote:
> > >>
> > >>
> > >> Hi devs,
> > >>
> > >> It's been 5 months after releasing Storm 1.0.0, and now 1.x branch has
> > lots
> > >> of CHANGELOG and also pending reviews.
> > >> It's also been a long time after 1.1.0 RC1 is canceled.
> > >>
> > >> I think it may be good to put some efforts to review and merge pending
> > pull
> > >> requests (except things which takes time to review and test), and
> > release
> > >> 1.1.0 soon.
> > >>
> > >> What do you think?
> > >>
> > >> I'm also open to volunteer release manager for 1.1.0 after we document
> > the
> > >> process of official release.
> > >>
> > >> Thanks,
> > >> Jungtaek Lim (HeartSaVioR)
> > >>
> > >>
> > >>
> > >>
> > >>
> >
> >
>


[GitHub] storm pull request #1778: [STORM-2082][SQL] add sql external module storm-sq...

2016-11-15 Thread vesense
Github user vesense commented on a diff in the pull request:

https://github.com/apache/storm/pull/1778#discussion_r87985904
  
--- Diff: 
external/sql/storm-sql-external/storm-sql-hdfs/src/test/org/apache/storm/sql/hdfs/TestHdfsDataSourcesProvider.java
 ---
@@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.storm.sql.hdfs;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Lists;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.storm.hdfs.trident.HdfsState;
+import org.apache.storm.hdfs.trident.HdfsStateFactory;
+import org.apache.storm.hdfs.trident.HdfsUpdater;
+import org.apache.storm.sql.runtime.DataSourcesRegistry;
+import org.apache.storm.sql.runtime.FieldInfo;
+import org.apache.storm.sql.runtime.ISqlTridentDataSource;
+import org.apache.storm.sql.runtime.serde.json.JsonSerializer;
+import org.apache.storm.trident.state.StateUpdater;
+import org.apache.storm.trident.tuple.TridentTuple;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import org.mockito.ArgumentMatcher;
+import org.mockito.internal.util.reflection.Whitebox;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Properties;
+
+import static org.apache.storm.hdfs.trident.HdfsState.HdfsFileOptions;
+import static org.mockito.Matchers.argThat;
+import static org.mockito.Mockito.doReturn;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.verify;
+
+public class TestHdfsDataSourcesProvider {
+  private static final List FIELDS = ImmutableList.of(
+  new FieldInfo("ID", int.class, true),
+  new FieldInfo("val", String.class, false));
+  private static final List FIELD_NAMES = ImmutableList.of("ID", 
"val");
+  private static final JsonSerializer SERIALIZER = new 
JsonSerializer(FIELD_NAMES);
+  private static final Properties TBL_PROPERTIES = new Properties();
+
+  private static String hdfsURI;
+  private static MiniDFSCluster hdfsCluster;
+
+  static {
+TBL_PROPERTIES.put("hdfs.file.path", "/unittest");
+TBL_PROPERTIES.put("hdfs.file.name", "test1.txt");
+TBL_PROPERTIES.put("hdfs.rotation.time.seconds", "120");
+  }
+
+  @Before
+  public void setup() throws Exception {
+Configuration conf = new Configuration();
+conf.set("fs.trash.interval", "10");
+conf.setBoolean("dfs.permissions", true);
+File baseDir = new File("./target/hdfs/").getAbsoluteFile();
+FileUtil.fullyDelete(baseDir);
+conf.set(MiniDFSCluster.HDFS_MINIDFS_BASEDIR, 
baseDir.getAbsolutePath());
+
+MiniDFSCluster.Builder builder = new MiniDFSCluster.Builder(conf);
+hdfsCluster = builder.build();
+hdfsURI = "hdfs://localhost:" + hdfsCluster.getNameNodePort() + "/";
+  }
+
+  @After
+  public void shutDown() throws IOException {
+hdfsCluster.shutdown();
+  }
+
+  @SuppressWarnings("unchecked")
+  @Test
+  public void testHdfsSink() {
+ISqlTridentDataSource ds = 
DataSourcesRegistry.constructTridentDataSource(
+URI.create(hdfsURI), null, null, TBL_PROPERTIES, FIELDS);
+Assert.assertNotNull(ds);
+
+ISqlTridentDataSource.SqlTridentConsumer consumer = ds.getConsumer();
+
+Assert.assertEquals(HdfsStateFactory.class, 
consumer.getStateFactory().getClass());
+Assert.assertEquals(HdfsUpdater.class, 
consumer.getStateUpdater().getClass());
+
+HdfsState state = (HdfsState) 
consumer.getStateFactory().makeState(Collections.emptyMap(), null, 0, 1);
+StateUpdater 

[GitHub] storm pull request #1778: [STORM-2082][SQL] add sql external module storm-sq...

2016-11-15 Thread vesense
Github user vesense commented on a diff in the pull request:

https://github.com/apache/storm/pull/1778#discussion_r87985832
  
--- Diff: 
external/sql/storm-sql-external/storm-sql-hdfs/src/jvm/org/apache/storm/sql/hdfs/HdfsDataSourcesProvider.java
 ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.storm.sql.hdfs;
+
+import org.apache.storm.hdfs.trident.HdfsState;
+import org.apache.storm.hdfs.trident.HdfsStateFactory;
+import org.apache.storm.hdfs.trident.HdfsUpdater;
+import org.apache.storm.hdfs.trident.format.FileNameFormat;
+import org.apache.storm.hdfs.trident.format.RecordFormat;
+import org.apache.storm.hdfs.trident.format.SimpleFileNameFormat;
+import org.apache.storm.hdfs.trident.rotation.FileRotationPolicy;
+import org.apache.storm.hdfs.trident.rotation.FileSizeRotationPolicy;
+import org.apache.storm.hdfs.trident.rotation.TimedRotationPolicy;
+import org.apache.storm.sql.runtime.DataSource;
+import org.apache.storm.sql.runtime.DataSourcesProvider;
+import org.apache.storm.sql.runtime.FieldInfo;
+import org.apache.storm.sql.runtime.IOutputSerializer;
+import org.apache.storm.sql.runtime.ISqlTridentDataSource;
+import org.apache.storm.sql.runtime.SimpleSqlTridentConsumer;
+import org.apache.storm.sql.runtime.utils.FieldInfoUtils;
+import org.apache.storm.sql.runtime.utils.SerdeUtils;
+import org.apache.storm.trident.spout.ITridentDataSource;
+import org.apache.storm.trident.state.StateFactory;
+import org.apache.storm.trident.state.StateUpdater;
+import org.apache.storm.trident.tuple.TridentTuple;
+
+import java.net.URI;
+import java.util.List;
+import java.util.Properties;
+
+/**
+ * Create a HDFS sink based on the URI and properties. The URI has the 
format of hdfs://host:port/path-to-file
+ * The properties are in JSON format which specifies the name / path of 
the hdfs file and etc.
+ */
+public class HdfsDataSourcesProvider implements DataSourcesProvider {
+
+  private static class HdfsTridentDataSource implements 
ISqlTridentDataSource {
+private final String url;
+private final Properties props;
+private final IOutputSerializer serializer;
+
+private HdfsTridentDataSource(String url, Properties props, 
IOutputSerializer serializer) {
+  this.url = url;
+  this.props = props;
+  this.serializer = serializer;
+}
+
+@Override
+public ITridentDataSource getProducer() {
+  throw new UnsupportedOperationException(this.getClass().getName() + 
" doesn't provide Producer");
+}
+
+@Override
+public SqlTridentConsumer getConsumer() {
+  FileNameFormat fileNameFormat = new SimpleFileNameFormat()
+  .withPath(props.getProperty("hdfs.file.path", "/storm"))
+  .withName(props.getProperty("hdfs.file.name", "$TIME.$NUM.txt"));
+
+  RecordFormat recordFormat = new TridentRecordFormat(serializer);
+
+  FileRotationPolicy rotationPolicy;
+  String size = props.getProperty("hdfs.rotation.size.kb");
+  if (size != null) {
+rotationPolicy = new 
FileSizeRotationPolicy(Float.parseFloat(size), FileSizeRotationPolicy.Units.KB);
+  } else {
+float interval = 
Float.parseFloat(props.getProperty("hdfs.rotation.time.seconds", "600")); // 
default 600 seconds
--- End diff --

No default value used from storm-hdfs. I will add a preconditions check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1778: [STORM-2082][SQL] add sql external module storm-sq...

2016-11-15 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/storm/pull/1778#discussion_r87976945
  
--- Diff: 
external/sql/storm-sql-external/storm-sql-hdfs/src/test/org/apache/storm/sql/hdfs/TestHdfsDataSourcesProvider.java
 ---
@@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.storm.sql.hdfs;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Lists;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.FileUtil;
+import org.apache.hadoop.hdfs.MiniDFSCluster;
+import org.apache.storm.hdfs.trident.HdfsState;
+import org.apache.storm.hdfs.trident.HdfsStateFactory;
+import org.apache.storm.hdfs.trident.HdfsUpdater;
+import org.apache.storm.sql.runtime.DataSourcesRegistry;
+import org.apache.storm.sql.runtime.FieldInfo;
+import org.apache.storm.sql.runtime.ISqlTridentDataSource;
+import org.apache.storm.sql.runtime.serde.json.JsonSerializer;
+import org.apache.storm.trident.state.StateUpdater;
+import org.apache.storm.trident.tuple.TridentTuple;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import org.mockito.ArgumentMatcher;
+import org.mockito.internal.util.reflection.Whitebox;
+
+import java.io.File;
+import java.io.IOException;
+import java.net.URI;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Properties;
+
+import static org.apache.storm.hdfs.trident.HdfsState.HdfsFileOptions;
+import static org.mockito.Matchers.argThat;
+import static org.mockito.Mockito.doReturn;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.verify;
+
+public class TestHdfsDataSourcesProvider {
+  private static final List FIELDS = ImmutableList.of(
+  new FieldInfo("ID", int.class, true),
+  new FieldInfo("val", String.class, false));
+  private static final List FIELD_NAMES = ImmutableList.of("ID", 
"val");
+  private static final JsonSerializer SERIALIZER = new 
JsonSerializer(FIELD_NAMES);
+  private static final Properties TBL_PROPERTIES = new Properties();
+
+  private static String hdfsURI;
+  private static MiniDFSCluster hdfsCluster;
+
+  static {
+TBL_PROPERTIES.put("hdfs.file.path", "/unittest");
+TBL_PROPERTIES.put("hdfs.file.name", "test1.txt");
+TBL_PROPERTIES.put("hdfs.rotation.time.seconds", "120");
+  }
+
+  @Before
+  public void setup() throws Exception {
+Configuration conf = new Configuration();
+conf.set("fs.trash.interval", "10");
+conf.setBoolean("dfs.permissions", true);
+File baseDir = new File("./target/hdfs/").getAbsoluteFile();
+FileUtil.fullyDelete(baseDir);
+conf.set(MiniDFSCluster.HDFS_MINIDFS_BASEDIR, 
baseDir.getAbsolutePath());
+
+MiniDFSCluster.Builder builder = new MiniDFSCluster.Builder(conf);
+hdfsCluster = builder.build();
+hdfsURI = "hdfs://localhost:" + hdfsCluster.getNameNodePort() + "/";
+  }
+
+  @After
+  public void shutDown() throws IOException {
+hdfsCluster.shutdown();
+  }
+
+  @SuppressWarnings("unchecked")
+  @Test
+  public void testHdfsSink() {
+ISqlTridentDataSource ds = 
DataSourcesRegistry.constructTridentDataSource(
+URI.create(hdfsURI), null, null, TBL_PROPERTIES, FIELDS);
+Assert.assertNotNull(ds);
+
+ISqlTridentDataSource.SqlTridentConsumer consumer = ds.getConsumer();
+
+Assert.assertEquals(HdfsStateFactory.class, 
consumer.getStateFactory().getClass());
+Assert.assertEquals(HdfsUpdater.class, 
consumer.getStateUpdater().getClass());
+
+HdfsState state = (HdfsState) 
consumer.getStateFactory().makeState(Collections.emptyMap(), null, 0, 1);
+

[GitHub] storm pull request #1778: [STORM-2082][SQL] add sql external module storm-sq...

2016-11-15 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/storm/pull/1778#discussion_r87973924
  
--- Diff: 
external/sql/storm-sql-external/storm-sql-hdfs/src/jvm/org/apache/storm/sql/hdfs/HdfsDataSourcesProvider.java
 ---
@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.storm.sql.hdfs;
+
+import org.apache.storm.hdfs.trident.HdfsState;
+import org.apache.storm.hdfs.trident.HdfsStateFactory;
+import org.apache.storm.hdfs.trident.HdfsUpdater;
+import org.apache.storm.hdfs.trident.format.FileNameFormat;
+import org.apache.storm.hdfs.trident.format.RecordFormat;
+import org.apache.storm.hdfs.trident.format.SimpleFileNameFormat;
+import org.apache.storm.hdfs.trident.rotation.FileRotationPolicy;
+import org.apache.storm.hdfs.trident.rotation.FileSizeRotationPolicy;
+import org.apache.storm.hdfs.trident.rotation.TimedRotationPolicy;
+import org.apache.storm.sql.runtime.DataSource;
+import org.apache.storm.sql.runtime.DataSourcesProvider;
+import org.apache.storm.sql.runtime.FieldInfo;
+import org.apache.storm.sql.runtime.IOutputSerializer;
+import org.apache.storm.sql.runtime.ISqlTridentDataSource;
+import org.apache.storm.sql.runtime.SimpleSqlTridentConsumer;
+import org.apache.storm.sql.runtime.utils.FieldInfoUtils;
+import org.apache.storm.sql.runtime.utils.SerdeUtils;
+import org.apache.storm.trident.spout.ITridentDataSource;
+import org.apache.storm.trident.state.StateFactory;
+import org.apache.storm.trident.state.StateUpdater;
+import org.apache.storm.trident.tuple.TridentTuple;
+
+import java.net.URI;
+import java.util.List;
+import java.util.Properties;
+
+/**
+ * Create a HDFS sink based on the URI and properties. The URI has the 
format of hdfs://host:port/path-to-file
+ * The properties are in JSON format which specifies the name / path of 
the hdfs file and etc.
+ */
+public class HdfsDataSourcesProvider implements DataSourcesProvider {
+
+  private static class HdfsTridentDataSource implements 
ISqlTridentDataSource {
+private final String url;
+private final Properties props;
+private final IOutputSerializer serializer;
+
+private HdfsTridentDataSource(String url, Properties props, 
IOutputSerializer serializer) {
+  this.url = url;
+  this.props = props;
+  this.serializer = serializer;
+}
+
+@Override
+public ITridentDataSource getProducer() {
+  throw new UnsupportedOperationException(this.getClass().getName() + 
" doesn't provide Producer");
+}
+
+@Override
+public SqlTridentConsumer getConsumer() {
+  FileNameFormat fileNameFormat = new SimpleFileNameFormat()
+  .withPath(props.getProperty("hdfs.file.path", "/storm"))
+  .withName(props.getProperty("hdfs.file.name", "$TIME.$NUM.txt"));
+
+  RecordFormat recordFormat = new TridentRecordFormat(serializer);
+
+  FileRotationPolicy rotationPolicy;
+  String size = props.getProperty("hdfs.rotation.size.kb");
+  if (size != null) {
+rotationPolicy = new 
FileSizeRotationPolicy(Float.parseFloat(size), FileSizeRotationPolicy.Units.KB);
+  } else {
+float interval = 
Float.parseFloat(props.getProperty("hdfs.rotation.time.seconds", "600")); // 
default 600 seconds
--- End diff --

Is default value also used from storm-hdfs? If not let's require either 
`hdfs.rotation.size.kb` or `hdfs.rotation.time.seconds`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1777: STORM-2202 [Storm SQL] Document how to use support...

2016-11-15 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/storm/pull/1777#discussion_r87968979
  
--- Diff: docs/storm-sql-reference.md ---
@@ -1203,4 +1203,103 @@ and class for aggregate function is here:
 For now users can skip implementing `result` method if it doesn't need 
transform accumulated value, 
 but this behavior is subject to change so providing `result` is 
recommended. 
 
-Please note that users should use `--jars` or `--artifacts` while running 
Storm SQL runner to make sure UDFs and/or UDAFs are available in classpath. 
\ No newline at end of file
+Please note that users should use `--jars` or `--artifacts` while running 
Storm SQL runner to make sure UDFs and/or UDAFs are available in classpath.
+
+## External Data Sources
+
+### Specifying External Data Sources
+
+In StormSQL data is represented by external tables. Users can specify data 
sources using the `CREATE EXTERNAL TABLE` statement. The syntax of `CREATE 
EXTERNAL TABLE` closely follows the one defined in [Hive Data Definition 
Language](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL):
+
+```
+CREATE EXTERNAL TABLE table_name field_list
+[ STORED AS
+  INPUTFORMAT input_format_classname
+  OUTPUTFORMAT output_format_classname
+]
+LOCATION location
+[ TBLPROPERTIES tbl_properties ]
+[ AS select_stmt ]
+```
+
+Default input format and output format are JSON. We will introduce 
`supported formats` from further section.
+
+For example, the following statement specifies a Kafka spout and sink:
+
+```
+CREATE EXTERNAL TABLE FOO (ID INT PRIMARY KEY) LOCATION 
'kafka://localhost:2181/brokers?topic=test' TBLPROPERTIES 
'{"producer":{"bootstrap.servers":"localhost:9092","acks":"1","key.serializer":"org.apache.org.apache.storm.kafka.IntSerializer","value.serializer":"org.apache.org.apache.storm.kafka.ByteBufferSerializer"}}'
+```
+
+### Plugging in External Data Sources
+
+Users plug in external data sources through implementing the 
`ISqlTridentDataSource` interface and registers them using the mechanisms of 
Java's service loader. The external data source will be chosen based on the 
scheme of the URI of the tables. Please refer to the implementation of 
`storm-sql-kafka` for more details.
+
+### Supported Formats
+
+| Format  | Input format class | Output format class | Requires 
properties
+|:--- |:-- |:--- 
|:
+| JSON | org.apache.storm.sql.runtime.serde.json.JsonScheme | 
org.apache.storm.sql.runtime.serde.json.JsonSerializer | No
+| Avro | org.apache.storm.sql.runtime.serde.avro.AvroScheme | 
org.apache.storm.sql.runtime.serde.avro.AvroSerializer | Yes
+| CSV  | org.apache.storm.sql.runtime.serde.csv.CsvScheme | 
org.apache.storm.sql.runtime.serde.csv.CsvSerializer | No
+| TSV  | org.apache.storm.sql.runtime.serde.tsv.TsvScheme | 
org.apache.storm.sql.runtime.serde.tsv.TsvSerializer | No
+
+ Avro
+
+Avro requires users to describe the schema of record (both input and 
output). Schema should be described on `TBLPROPERTIES`.
+Input format needs to be described to `input.avro.schema`, and output 
format needs to be described to `output.avro.schema`.
+Schema string should be an escaped JSON so that `TBLPROPERTIES` is valid 
JSON.
+
+Example Schema description:
+
+`"input.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", 
\"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", 
\"type\": \"int\"} ]}"`
+
+`"output.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", 
\"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", 
\"type\": \"int\"} ]}"`
+
+ CSV
+
+It uses `Standard RFC4180 CSV Parser` and doesn't need any other 
properties.
+
+ TSV
+
+By default TSV uses `\t` as delimiter, but users can set another delimiter 
by setting `input.tsv.delimiter` and/or `output.tsv.delimiter`.
+Please note that it supports only one letter for delimiter.
+
+### Supported Data Sources
+
+| Data Source | Artifact Name  | Location prefix | Support 
Input data source | Support Output data source | Requires properties
+|:--- |:-- |:--- 
|:- |:-- |:---
+| Kafka | org.apache.storm:storm-sql-kafka | 
`kafka://zkhost:port/broker_path?topic=topic` | Yes | Yes | Yes
+| Redis | org.apache.storm:storm-sql-redis | 
`redis://:[password]@host:port/[dbIdx]` | No | Yes | Yes
+| MongoDB | org.apache.stormg:storm-sql-mongodb | 

[GitHub] storm pull request #1777: STORM-2202 [Storm SQL] Document how to use support...

2016-11-15 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/storm/pull/1777#discussion_r87968697
  
--- Diff: docs/storm-sql-reference.md ---
@@ -1203,4 +1203,103 @@ and class for aggregate function is here:
 For now users can skip implementing `result` method if it doesn't need 
transform accumulated value, 
 but this behavior is subject to change so providing `result` is 
recommended. 
 
-Please note that users should use `--jars` or `--artifacts` while running 
Storm SQL runner to make sure UDFs and/or UDAFs are available in classpath. 
\ No newline at end of file
+Please note that users should use `--jars` or `--artifacts` while running 
Storm SQL runner to make sure UDFs and/or UDAFs are available in classpath.
+
+## External Data Sources
+
+### Specifying External Data Sources
+
+In StormSQL data is represented by external tables. Users can specify data 
sources using the `CREATE EXTERNAL TABLE` statement. The syntax of `CREATE 
EXTERNAL TABLE` closely follows the one defined in [Hive Data Definition 
Language](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL):
+
+```
+CREATE EXTERNAL TABLE table_name field_list
+[ STORED AS
+  INPUTFORMAT input_format_classname
+  OUTPUTFORMAT output_format_classname
+]
+LOCATION location
+[ TBLPROPERTIES tbl_properties ]
+[ AS select_stmt ]
+```
+
+Default input format and output format are JSON. We will introduce 
`supported formats` from further section.
+
+For example, the following statement specifies a Kafka spout and sink:
+
+```
+CREATE EXTERNAL TABLE FOO (ID INT PRIMARY KEY) LOCATION 
'kafka://localhost:2181/brokers?topic=test' TBLPROPERTIES 
'{"producer":{"bootstrap.servers":"localhost:9092","acks":"1","key.serializer":"org.apache.org.apache.storm.kafka.IntSerializer","value.serializer":"org.apache.org.apache.storm.kafka.ByteBufferSerializer"}}'
+```
+
+### Plugging in External Data Sources
+
+Users plug in external data sources through implementing the 
`ISqlTridentDataSource` interface and registers them using the mechanisms of 
Java's service loader. The external data source will be chosen based on the 
scheme of the URI of the tables. Please refer to the implementation of 
`storm-sql-kafka` for more details.
+
+### Supported Formats
+
+| Format  | Input format class | Output format class | Requires 
properties
+|:--- |:-- |:--- 
|:
+| JSON | org.apache.storm.sql.runtime.serde.json.JsonScheme | 
org.apache.storm.sql.runtime.serde.json.JsonSerializer | No
+| Avro | org.apache.storm.sql.runtime.serde.avro.AvroScheme | 
org.apache.storm.sql.runtime.serde.avro.AvroSerializer | Yes
+| CSV  | org.apache.storm.sql.runtime.serde.csv.CsvScheme | 
org.apache.storm.sql.runtime.serde.csv.CsvSerializer | No
+| TSV  | org.apache.storm.sql.runtime.serde.tsv.TsvScheme | 
org.apache.storm.sql.runtime.serde.tsv.TsvSerializer | No
+
+ Avro
+
+Avro requires users to describe the schema of record (both input and 
output). Schema should be described on `TBLPROPERTIES`.
+Input format needs to be described to `input.avro.schema`, and output 
format needs to be described to `output.avro.schema`.
+Schema string should be an escaped JSON so that `TBLPROPERTIES` is valid 
JSON.
+
+Example Schema description:
+
+`"input.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", 
\"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", 
\"type\": \"int\"} ]}"`
+
+`"output.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", 
\"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", 
\"type\": \"int\"} ]}"`
+
+ CSV
+
+It uses `Standard RFC4180 CSV Parser` and doesn't need any other 
properties.
--- End diff --

Yes that would be a good idea. Will address.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm issue #1777: STORM-2202 [Storm SQL] Document how to use supported conn...

2016-11-15 Thread vesense
Github user vesense commented on the issue:

https://github.com/apache/storm/pull/1777
  
Thanks @HeartSaVioR Just two minor comments. Others looks good to me. +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1777: STORM-2202 [Storm SQL] Document how to use support...

2016-11-15 Thread vesense
Github user vesense commented on a diff in the pull request:

https://github.com/apache/storm/pull/1777#discussion_r87967670
  
--- Diff: docs/storm-sql-reference.md ---
@@ -1203,4 +1203,103 @@ and class for aggregate function is here:
 For now users can skip implementing `result` method if it doesn't need 
transform accumulated value, 
 but this behavior is subject to change so providing `result` is 
recommended. 
 
-Please note that users should use `--jars` or `--artifacts` while running 
Storm SQL runner to make sure UDFs and/or UDAFs are available in classpath. 
\ No newline at end of file
+Please note that users should use `--jars` or `--artifacts` while running 
Storm SQL runner to make sure UDFs and/or UDAFs are available in classpath.
+
+## External Data Sources
+
+### Specifying External Data Sources
+
+In StormSQL data is represented by external tables. Users can specify data 
sources using the `CREATE EXTERNAL TABLE` statement. The syntax of `CREATE 
EXTERNAL TABLE` closely follows the one defined in [Hive Data Definition 
Language](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL):
+
+```
+CREATE EXTERNAL TABLE table_name field_list
+[ STORED AS
+  INPUTFORMAT input_format_classname
+  OUTPUTFORMAT output_format_classname
+]
+LOCATION location
+[ TBLPROPERTIES tbl_properties ]
+[ AS select_stmt ]
+```
+
+Default input format and output format are JSON. We will introduce 
`supported formats` from further section.
+
+For example, the following statement specifies a Kafka spout and sink:
+
+```
+CREATE EXTERNAL TABLE FOO (ID INT PRIMARY KEY) LOCATION 
'kafka://localhost:2181/brokers?topic=test' TBLPROPERTIES 
'{"producer":{"bootstrap.servers":"localhost:9092","acks":"1","key.serializer":"org.apache.org.apache.storm.kafka.IntSerializer","value.serializer":"org.apache.org.apache.storm.kafka.ByteBufferSerializer"}}'
+```
+
+### Plugging in External Data Sources
+
+Users plug in external data sources through implementing the 
`ISqlTridentDataSource` interface and registers them using the mechanisms of 
Java's service loader. The external data source will be chosen based on the 
scheme of the URI of the tables. Please refer to the implementation of 
`storm-sql-kafka` for more details.
+
+### Supported Formats
+
+| Format  | Input format class | Output format class | Requires 
properties
+|:--- |:-- |:--- 
|:
+| JSON | org.apache.storm.sql.runtime.serde.json.JsonScheme | 
org.apache.storm.sql.runtime.serde.json.JsonSerializer | No
+| Avro | org.apache.storm.sql.runtime.serde.avro.AvroScheme | 
org.apache.storm.sql.runtime.serde.avro.AvroSerializer | Yes
+| CSV  | org.apache.storm.sql.runtime.serde.csv.CsvScheme | 
org.apache.storm.sql.runtime.serde.csv.CsvSerializer | No
+| TSV  | org.apache.storm.sql.runtime.serde.tsv.TsvScheme | 
org.apache.storm.sql.runtime.serde.tsv.TsvSerializer | No
+
+ Avro
+
+Avro requires users to describe the schema of record (both input and 
output). Schema should be described on `TBLPROPERTIES`.
+Input format needs to be described to `input.avro.schema`, and output 
format needs to be described to `output.avro.schema`.
+Schema string should be an escaped JSON so that `TBLPROPERTIES` is valid 
JSON.
+
+Example Schema description:
+
+`"input.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", 
\"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", 
\"type\": \"int\"} ]}"`
+
+`"output.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", 
\"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", 
\"type\": \"int\"} ]}"`
+
+ CSV
+
+It uses `Standard RFC4180 CSV Parser` and doesn't need any other 
properties.
--- End diff --

Minor. How about add a link to RFC4180? It is convenient for users who want 
to look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1777: STORM-2202 [Storm SQL] Document how to use support...

2016-11-15 Thread vesense
Github user vesense commented on a diff in the pull request:

https://github.com/apache/storm/pull/1777#discussion_r87968161
  
--- Diff: docs/storm-sql-reference.md ---
@@ -1203,4 +1203,103 @@ and class for aggregate function is here:
 For now users can skip implementing `result` method if it doesn't need 
transform accumulated value, 
 but this behavior is subject to change so providing `result` is 
recommended. 
 
-Please note that users should use `--jars` or `--artifacts` while running 
Storm SQL runner to make sure UDFs and/or UDAFs are available in classpath. 
\ No newline at end of file
+Please note that users should use `--jars` or `--artifacts` while running 
Storm SQL runner to make sure UDFs and/or UDAFs are available in classpath.
+
+## External Data Sources
+
+### Specifying External Data Sources
+
+In StormSQL data is represented by external tables. Users can specify data 
sources using the `CREATE EXTERNAL TABLE` statement. The syntax of `CREATE 
EXTERNAL TABLE` closely follows the one defined in [Hive Data Definition 
Language](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL):
+
+```
+CREATE EXTERNAL TABLE table_name field_list
+[ STORED AS
+  INPUTFORMAT input_format_classname
+  OUTPUTFORMAT output_format_classname
+]
+LOCATION location
+[ TBLPROPERTIES tbl_properties ]
+[ AS select_stmt ]
+```
+
+Default input format and output format are JSON. We will introduce 
`supported formats` from further section.
+
+For example, the following statement specifies a Kafka spout and sink:
+
+```
+CREATE EXTERNAL TABLE FOO (ID INT PRIMARY KEY) LOCATION 
'kafka://localhost:2181/brokers?topic=test' TBLPROPERTIES 
'{"producer":{"bootstrap.servers":"localhost:9092","acks":"1","key.serializer":"org.apache.org.apache.storm.kafka.IntSerializer","value.serializer":"org.apache.org.apache.storm.kafka.ByteBufferSerializer"}}'
+```
+
+### Plugging in External Data Sources
+
+Users plug in external data sources through implementing the 
`ISqlTridentDataSource` interface and registers them using the mechanisms of 
Java's service loader. The external data source will be chosen based on the 
scheme of the URI of the tables. Please refer to the implementation of 
`storm-sql-kafka` for more details.
+
+### Supported Formats
+
+| Format  | Input format class | Output format class | Requires 
properties
+|:--- |:-- |:--- 
|:
+| JSON | org.apache.storm.sql.runtime.serde.json.JsonScheme | 
org.apache.storm.sql.runtime.serde.json.JsonSerializer | No
+| Avro | org.apache.storm.sql.runtime.serde.avro.AvroScheme | 
org.apache.storm.sql.runtime.serde.avro.AvroSerializer | Yes
+| CSV  | org.apache.storm.sql.runtime.serde.csv.CsvScheme | 
org.apache.storm.sql.runtime.serde.csv.CsvSerializer | No
+| TSV  | org.apache.storm.sql.runtime.serde.tsv.TsvScheme | 
org.apache.storm.sql.runtime.serde.tsv.TsvSerializer | No
+
+ Avro
+
+Avro requires users to describe the schema of record (both input and 
output). Schema should be described on `TBLPROPERTIES`.
+Input format needs to be described to `input.avro.schema`, and output 
format needs to be described to `output.avro.schema`.
+Schema string should be an escaped JSON so that `TBLPROPERTIES` is valid 
JSON.
+
+Example Schema description:
+
+`"input.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", 
\"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", 
\"type\": \"int\"} ]}"`
+
+`"output.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", 
\"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", 
\"type\": \"int\"} ]}"`
+
+ CSV
+
+It uses `Standard RFC4180 CSV Parser` and doesn't need any other 
properties.
+
+ TSV
+
+By default TSV uses `\t` as delimiter, but users can set another delimiter 
by setting `input.tsv.delimiter` and/or `output.tsv.delimiter`.
+Please note that it supports only one letter for delimiter.
+
+### Supported Data Sources
+
+| Data Source | Artifact Name  | Location prefix | Support 
Input data source | Support Output data source | Requires properties
+|:--- |:-- |:--- 
|:- |:-- |:---
+| Kafka | org.apache.storm:storm-sql-kafka | 
`kafka://zkhost:port/broker_path?topic=topic` | Yes | Yes | Yes
+| Redis | org.apache.storm:storm-sql-redis | 
`redis://:[password]@host:port/[dbIdx]` | No | Yes | Yes
+| MongoDB | org.apache.stormg:storm-sql-mongodb | 

[GitHub] storm pull request #1778: [STORM-2082][SQL] add sql external module storm-sq...

2016-11-15 Thread vesense
GitHub user vesense opened a pull request:

https://github.com/apache/storm/pull/1778

[STORM-2082][SQL] add sql external module storm-sql-hdfs



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vesense/storm STORM-2082

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/1778.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1778


commit c9c565464be732de4a4035bd6ead85ccf8204949
Author: Xin Wang 
Date:   2016-11-13T10:25:27Z

[STORM-2082][SQL] add sql external module storm-sql-hdfs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---