Re: [VOTE] Apache DataFu 1.4.0 release RC1

2018-03-19 Thread Sam Shah
+1

On Mon, Mar 19, 2018 at 11:00 AM, Matthew Hayes  wrote:

> Hi all,
>
> I'd like to call a vote to release Apache DataFu 1.4.0.  This is the first
> release since graduating.  Given this and that the last release was 1.3.3 I
> thought it was worth bumping the minor version.
>
> The source release candidate RC1 can be downloaded here:
>
> *https://dist.apache.org/repos/dist/dev/datafu/apache-datafu-1.4.0-rc1/
> *
>
> The artifacts (i.e. JARs) corresponding to this release candidate can be
> found here:
>
> https://repository.apache.org/content/repositories/orgapachedatafu-1008/
>
> This has been signed with PGP key 7BA4C7DF, corresponding to
> mha...@apache.org, which is included in the repository's KEYS file.  This
> key can be found on keyservers, such as:
>
> *http://pgp.mit.edu/pks/lookup?op=get=0x7BA4C7DF
> *
>
> It is also listed here:
>
> https://people.apache.org/keys/group/datafu.asc
>
> The release candidate has been tagged with release-1.4.0-rc1, which has
> been signed with the same key.  I've also created a branch 1.4.0.
>
> For reference, here is a list of all closed JIRAs tagged with 1.4.0:
>
> https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20DATAFU%20AND%20status%20in%20(Resolved%2C%20Closed)%
> 20AND%20fixVersion%20%3D%201.4.0%20ORDER%20BY%20priority%
> 20DESC%2C%20updated%20DESC
>
> For a summary of the changes in this release, see:
>
> https://git-wip-us.apache.org/repos/asf?p=datafu.git;a=blob_
> plain;f=changes.md;hb=refs/heads/1.4.0
>
> Please download the release candidate, check the hashes, check the
> signatures, test it, and vote.  The vote will be open for 72 hours (ends on
> March 22nd 11 am PST).
>
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and reason why)
>
> My vote: +1
>
> Thanks,
> Matthew Hayes
>


[VOTE] Apache DataFu 1.4.0 release RC1

2018-03-19 Thread Matthew Hayes
Hi all,

I'd like to call a vote to release Apache DataFu 1.4.0.  This is the first
release since graduating.  Given this and that the last release was 1.3.3 I
thought it was worth bumping the minor version.

The source release candidate RC1 can be downloaded here:

*https://dist.apache.org/repos/dist/dev/datafu/apache-datafu-1.4.0-rc1/
*

The artifacts (i.e. JARs) corresponding to this release candidate can be
found here:

https://repository.apache.org/content/repositories/orgapachedatafu-1008/

This has been signed with PGP key 7BA4C7DF, corresponding to
mha...@apache.org, which is included in the repository's KEYS file.  This
key can be found on keyservers, such as:

*http://pgp.mit.edu/pks/lookup?op=get=0x7BA4C7DF
*

It is also listed here:

https://people.apache.org/keys/group/datafu.asc

The release candidate has been tagged with release-1.4.0-rc1, which has
been signed with the same key.  I've also created a branch 1.4.0.

For reference, here is a list of all closed JIRAs tagged with 1.4.0:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20DATAFU%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%201.4.0%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC

For a summary of the changes in this release, see:

https://git-wip-us.apache.org/repos/asf?p=datafu.git;a=blob_plain;f=changes.md;hb=refs/heads/1.4.0

Please download the release candidate, check the hashes, check the
signatures, test it, and vote.  The vote will be open for 72 hours (ends on
March 22nd 11 am PST).

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)

My vote: +1

Thanks,
Matthew Hayes


[jira] [Commented] (DATAFU-127) New macro - samply by keys

2018-03-19 Thread Matthew Hayes (JIRA)

[ 
https://issues.apache.org/jira/browse/DATAFU-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405138#comment-16405138
 ] 

Matthew Hayes commented on DATAFU-127:
--

This seems useful, however should it maybe instead be named {{filter_by_keys}}? 
 "sample" to me implies some sort of random selection.

> New macro - samply by keys
> --
>
> Key: DATAFU-127
> URL: https://issues.apache.org/jira/browse/DATAFU-127
> Project: DataFu
>  Issue Type: New Feature
>Reporter: Eyal Allweil
>Assignee: Eyal Allweil
>Priority: Major
>  Labels: macro
> Attachments: DATAFU-127.patch
>
>
> Two macros that return a sample of a larger table based on a list of keys, 
> with the schema of the larger table. One of the macros filters by dates, the 
> other doesn't.
> If there are multiple rows with a key that appears in the key list, all of 
> them will be returned (no deduplication is done). The results are returned 
> ordered by the key field in a single file.
> The implementation uses a replicated join for efficiency, but this means the 
> key list shouldn't be too large as to not fit in memory.
> The first macro's definition looks as follows:
> DEFINE sample_by_keys(table, sample_set, join_key_table, join_key_sample) 
> returns out {
> - table_name  - table name to sample
> - sample_set  - a set of keys
> - join_key_table  - join column name in the table
> - join_key_sample - join column name in the sample



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DATAFU-145) Update release documentation RELEASE.md

2018-03-19 Thread Matthew Hayes (JIRA)
Matthew Hayes created DATAFU-145:


 Summary: Update release documentation RELEASE.md
 Key: DATAFU-145
 URL: https://issues.apache.org/jira/browse/DATAFU-145
 Project: DataFu
  Issue Type: Sub-task
Reporter: Matthew Hayes






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DATAFU-140) Remove MD5 hash from release task

2018-03-19 Thread Matthew Hayes (JIRA)

 [ 
https://issues.apache.org/jira/browse/DATAFU-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Hayes updated DATAFU-140:
-
Affects Version/s: 1.3.3

> Remove MD5 hash from release task
> -
>
> Key: DATAFU-140
> URL: https://issues.apache.org/jira/browse/DATAFU-140
> Project: DataFu
>  Issue Type: Task
>Affects Versions: 1.3.3
>Reporter: Matthew Hayes
>Priority: Major
> Fix For: 1.4.0
>
>
> MD5 should no longer be included in releases:
>  
> http://www.apache.org/dev/release-distribution



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DATAFU-140) Remove MD5 hash from release task

2018-03-19 Thread Matthew Hayes (JIRA)

 [ 
https://issues.apache.org/jira/browse/DATAFU-140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Hayes updated DATAFU-140:
-
Fix Version/s: 1.4.0

> Remove MD5 hash from release task
> -
>
> Key: DATAFU-140
> URL: https://issues.apache.org/jira/browse/DATAFU-140
> Project: DataFu
>  Issue Type: Task
>Affects Versions: 1.3.3
>Reporter: Matthew Hayes
>Priority: Major
> Fix For: 1.4.0
>
>
> MD5 should no longer be included in releases:
>  
> http://www.apache.org/dev/release-distribution



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DATAFU-88) Port Stanford Core NLP Functionality to DataFu

2018-03-19 Thread Matthew Hayes (JIRA)

 [ 
https://issues.apache.org/jira/browse/DATAFU-88?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Hayes closed DATAFU-88.
---
Resolution: Won't Do

> Port Stanford Core NLP Functionality to DataFu
> --
>
> Key: DATAFU-88
> URL: https://issues.apache.org/jira/browse/DATAFU-88
> Project: DataFu
>  Issue Type: New Feature
>Reporter: Russell Jurney
>Priority: Major
>  Labels: lemmatizer, nlp, pig, pig_udf, stanford, stemmer
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> For starters I need the Stanford Core NLP stemmer and lemmatizer. 
> It looks like maybe I can add something generic and feed arguments to code 
> like: props.put("annotators", "tokenize, ssplit, pos, lemma");
> Helpful example of lemmatizing at 
> http://stackoverflow.com/questions/1578062/lemmatization-java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DATAFU-88) Port Stanford Core NLP Functionality to DataFu

2018-03-19 Thread Matthew Hayes (JIRA)

 [ 
https://issues.apache.org/jira/browse/DATAFU-88?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Hayes closed DATAFU-88.
---
Resolution: Won't Do

> Port Stanford Core NLP Functionality to DataFu
> --
>
> Key: DATAFU-88
> URL: https://issues.apache.org/jira/browse/DATAFU-88
> Project: DataFu
>  Issue Type: New Feature
>Reporter: Russell Jurney
>Assignee: Russell Jurney
>Priority: Major
>  Labels: lemmatizer, nlp, pig, pig_udf, stanford, stemmer
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> For starters I need the Stanford Core NLP stemmer and lemmatizer. 
> It looks like maybe I can add something generic and feed arguments to code 
> like: props.put("annotators", "tokenize, ssplit, pos, lemma");
> Helpful example of lemmatizing at 
> http://stackoverflow.com/questions/1578062/lemmatization-java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DATAFU-88) Port Stanford Core NLP Functionality to DataFu

2018-03-19 Thread Eyal Allweil (JIRA)

[ 
https://issues.apache.org/jira/browse/DATAFU-88?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404496#comment-16404496
 ] 

Eyal Allweil commented on DATAFU-88:


I'm fine with closing this.

> Port Stanford Core NLP Functionality to DataFu
> --
>
> Key: DATAFU-88
> URL: https://issues.apache.org/jira/browse/DATAFU-88
> Project: DataFu
>  Issue Type: New Feature
>Affects Versions: 1.3.0
>Reporter: Russell Jurney
>Assignee: Russell Jurney
>Priority: Major
>  Labels: lemmatizer, nlp, pig, pig_udf, stanford, stemmer
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> For starters I need the Stanford Core NLP stemmer and lemmatizer. 
> It looks like maybe I can add something generic and feed arguments to code 
> like: props.put("annotators", "tokenize, ssplit, pos, lemma");
> Helpful example of lemmatizing at 
> http://stackoverflow.com/questions/1578062/lemmatization-java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: new datafu git repo

2018-03-19 Thread Eyal Allweil
I didn't notice that the GitHub redirected ... I only noticed  the redirect in 
the main DataFu website. That's fine, it makes the README notice I was thinking 
of unnecessary.
 

On Monday, March 19, 2018, 7:49:44 AM GMT+2, Matthew Hayes 
 wrote:  
 
 Not as far as I know.  I think it's best to have the old repo fail so people 
update to the new one so they don't accidentally work off an old version.  I 
see also that https://github.com/apache/incubator-datafu redirects to 
https://github.com/apache/datafu which is pretty useful.
On Sun, Mar 18, 2018 at 1:32 AM, Eyal Allweil  wrote:

Can we still make one more change to the old incubator git? I would change the 
README of the incubator repository to advise people to use the new one, so they 
know not to use it. It looks like GitHub will preserve both versions (which is 
not necessarily a bad thing).
 

On Saturday, March 17, 2018, 1:41:12 AM GMT+2, Matthew Hayes 
 wrote:  
 
 The DataFu git repo has been migrated [1] due to graduation from
incubator.  The old repo

https://git-wip-us.apache.org/ repos/asf/incubator-datafu.git will no longer
work.  Please use https://git-wip-us.apache.org/ repos/asf/datafu.git going
forward.  If you already have the repo cloned, you should be able to edit
your .git/config file with this repo change (just remove "incubator") and
it should work as normal.


-Matt


[1] https://issues.apache.org/ jira/browse/INFRA-16085