from:"Corey Nolet"

Re: [accumulo] your /dist/ artifacts - 1 BAD signature

2014-09-25 Thread Corey Nolet

I see what happened. I was expecting the mvn:release plugin to push the
"prepare for next development iteration" which it did not. I just pushed it
up and created the tag. I'll work on the release notes in a bit.

On Thu, Sep 25, 2014 at 3:33 PM, Christopher  wrote:

> [note: thread moved to dev@]
>
> Okay, I just confirmed that the current files in dist are the same ones in
> Maven Central are the same ones that we voted on. So, that issue is
> resolved. I double checked and saw that the gpg-signed tag hasn't been
> created for 1.6.1 (git tag -s 1.6.1 origin/1.6.1-rc1). I guess technically
> anybody could do this, and merge it (along with the version bump to
> 1.6.2-SNAPSHOT commit) to 1.6.2-SNAPSHOT branch (and forward, with -sours),
> if Corey doesn't have time/gets busy.
>
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
> On Thu, Sep 25, 2014 at 2:21 PM, Corey Nolet  wrote:
>
> > There's still a few things I need to do before announcing the release to
> > the user list. Merging the rc into the next version branch was one of
> them
> > and creating the official release tag was another. I'll do these tonight
> as
> > well as writing up the release notes for the site.
> >
> >
> > On Thu, Sep 25, 2014 at 1:59 PM, Christopher 
> wrote:
> >
> > > Also, we can move this list to dev@. There's no reason for it to be
> > > private@
> > > .
> > >
> > >
> > > --
> > > Christopher L Tubbs II
> > > http://gravatar.com/ctubbsii
> > >
> > > On Thu, Sep 25, 2014 at 1:59 PM, Christopher 
> > wrote:
> > >
> > > > There's one more problem that Keith and I found... it doesn't look
> like
> > > > the rc1 branch got merged to 1.6.2-SNAPSHOT. I don't know if some
> other
> > > > branch got accidentally merged instead.
> > > >
> > > >
> > > > --
> > > > Christopher L Tubbs II
> > > > http://gravatar.com/ctubbsii
> > > >
> > > > On Thu, Sep 25, 2014 at 1:40 PM, Josh Elser 
> > > wrote:
> > > >
> > > >> Things look good to me now. I checked the artifacts on dist/ against
> > > what
> > > >> I have from evaluating the RC and they appear to match.
> > > >>
> > > >> Anything else we need to do here?
> > > >>
> > > >>
> > > >> Christopher wrote:
> > > >>
> > > >>> I was able to confirm the signature is bad. When I checked the RC,
> > the
> > > >>> signature was good, so I'm guessing the wrong one just got
> uploaded.
> > I
> > > >>> don't have a copy of the RC that I had previously downloaded, but I
> > was
> > > >>> able to grab a copy of what was deployed to Maven central and fix
> the
> > > >>> dist
> > > >>> sigs/checksums from that.
> > > >>>
> > > >>> Now, it's possible that the wrong artifacts were uploaded to Maven
> > > >>> central
> > > >>> (perhaps the wrong staging repo was promoted?) I can't know that
> for
> > > >>> sure,
> > > >>> until I can get to work and check my last download from the RC vote
> > and
> > > >>> compare with what's in Maven central now. If that is the case, then
> > we
> > > >>> need
> > > >>> to determine precisely what is different from this upload and what
> > was
> > > >>> voted on and see if we need to immediately re-release as 1.6.2 to
> fix
> > > the
> > > >>> problems.
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Christopher L Tubbs II
> > > >>> http://gravatar.com/ctubbsii
> > > >>>
> > > >>> On Thu, Sep 25, 2014 at 3:12 AM, Henk Penning
> > > wrote:
> > > >>>
> > > >>>  Hi PMC accumulo,
> > > >>>>
> > > >>>>I watch 'www.apache.org/dist/', and I noticed that :
> > > >>>>
> > > >>>>-- you have 1 BAD pgp signature
> > > >>>>
> > > >>>> accumulo/1.6.1/accumulo-1.6.1-src.tar.gz.asc
> > > >>>>
> > > >>>>Please fix this problem soon ; for details, see
> > > >>>>
> > > >>>>
> > > http://people.apache.org/~henkp/checker/sig.html#project-accumulo
> > > >>>>  http://people.apache.org/~henkp/checker/md5.html
> > > >>>>
> > > >>>>For information on how to fix problems, see the faq :
> > > >>>>
> > > >>>>  http://people.apache.org/~henkp/checker/faq.html
> > > >>>>
> > > >>>>Thanks a lot, regards,
> > > >>>>
> > > >>>>Henk Penning -- apache.org infrastructure
> > > >>>>
> > > >>>>PS. The contents of this message is generated,
> > > >>>>but the mail itself is sent "by hand".
> > > >>>>PS. Please cc me on all relevant emails.
> > > >>>>
> > > >>>> -   _
> > > >>>> Henk P. Penning, ICT-beta  R Uithof WISK-412  _/ _
> > > >>>> Faculty of Science, Utrecht University T +31 30 253 4106 / _/
> > > >>>> Budapestlaan 6, 3584CD Utrecht, NL F +31 30 253 4553 _/ _/
> > > >>>> http://people.cs.uu.nl/henkp/  M penn...@uu.nl _/
> > > >>>>
> > > >>>>
> > > >>>
> > > >
> > >
> >
>

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-25 Thread Corey Nolet

I'm seeing the behavior under Max OS X and Fedora 19 and they have been
consistently failing for me. I'm thinking ACCUMULO-3073. Since others are
able to get it to pass, I did not think it should fail the vote solely on
that but I do think it needs attention, quickly.

On Thu, Sep 25, 2014 at 10:43 AM, Bill Havanki 
wrote:

> I haven't had an opportunity to try it again since my +1, but prior to that
> it has been consistently failing.
>
> - I tried extending the timeout on the test, but it would still time out.
> - I see the behavior on Mac OS X and under CentOS. (I wonder if it's a JVM
> thing?)
>
> On Wed, Sep 24, 2014 at 9:06 PM, Corey Nolet  wrote:
>
> > Vote passes with 4 +1's and no -1's.
> >
> > Bill, were you able to get the IT to run yet? I'm still having timeouts
> on
> > my end as well.
> >
> >
> > On Wed, Sep 24, 2014 at 1:41 PM, Josh Elser 
> wrote:
> >
> > > The crux of it is that both of the errors in the CRC where single bit
> > > "variants".
> > >
> > > y instead of 9 and p instead of 0
> > >
> > > Both of these cases are a '1' in the most significant bit of the byte
> > > instead of a '0'. We recognized these because y and p are outside of
> the
> > > hex range. Fixing both of these fixes the CRC error (manually
> verified).
> > >
> > > That's all we know right now. I'm currently running memtest86. I do not
> > > have ECC ram, so it *is* theoretically possible that was the cause.
> After
> > > running memtest for a day or so (or until I need my desktop functional
> > > again), I'll go back and see if I can reproduce this again.
> > >
> > >
> > > Mike Drob wrote:
> > >
> > >> Any chance the IRC chats can make it only the ML for posterity?
> > >>
> > >> Mike
> > >>
> > >> On Wed, Sep 24, 2014 at 12:04 PM, Keith Turner
> > wrote:
> > >>
> > >>  On Wed, Sep 24, 2014 at 12:44 PM, Russ Weeks<
> rwe...@newbrightidea.com>
> > >>> wrote:
> > >>>
> > >>>  Interesting that "y" (0x79) and "9" (0x39) are one bit "away" from
> > each
> > >>>> other. I blame cosmic rays!
> > >>>>
> > >>>>  It is interesting, and thats only half of the story.  Its been
> > >>> interesting
> > >>> chatting w/ Josh about this on irc and hearing about his findings.
> > >>>
> > >>>
> > >>>  On Wed, Sep 24, 2014 at 9:05 AM, Josh Elser
> > >>>>
> > >>> wrote:
> > >>>
> > >>>> The offending keys are:
> > >>>>>>>
> > >>>>>>> 389a85668b6ebf8e 2ff6:4a78 [] 1411499115242
> > >>>>>>>
> > >>>>>>> 3a10885b-d481-4d00-be00-0477e231ey65:8576b169:
> > >>>>>>> 0cd98965c9ccc1d0:ba15529e
> > >>>>>>>
> > >>>>>>>  The careful eye will notice that the UUID in the first component
> > of
> > >>>>> the
> > >>>>> value has a different suffix than the next corrupt key/value (ends
> > with
> > >>>>> "ey65" instead of "e965"). Fixing this in the Value and re-running
> > the
> > >>>>>
> > >>>> CRC
> > >>>>
> > >>>>> makes it pass.
> > >>>>>
> > >>>>>
> > >>>>>   and
> > >>>>>
> > >>>>>> 7e56b58a0c7df128 5fa0:6249 [] 1411499311578
> > >>>>>>>
> > >>>>>>> 3a10885b-d481-4d00-be00-0477e231e965:p000872d60eb:
> > >>>>>>> 499fa72752d82a7c:5c5f19e8
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>
> >
>
>
>
> --
> // Bill Havanki
> // Solutions Architect, Cloudera Govt Solutions
> // 443.686.9283
>

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-24 Thread Corey Nolet

Vote passes with 4 +1's and no -1's.

Bill, were you able to get the IT to run yet? I'm still having timeouts on
my end as well.


On Wed, Sep 24, 2014 at 1:41 PM, Josh Elser  wrote:

> The crux of it is that both of the errors in the CRC where single bit
> "variants".
>
> y instead of 9 and p instead of 0
>
> Both of these cases are a '1' in the most significant bit of the byte
> instead of a '0'. We recognized these because y and p are outside of the
> hex range. Fixing both of these fixes the CRC error (manually verified).
>
> That's all we know right now. I'm currently running memtest86. I do not
> have ECC ram, so it *is* theoretically possible that was the cause. After
> running memtest for a day or so (or until I need my desktop functional
> again), I'll go back and see if I can reproduce this again.
>
>
> Mike Drob wrote:
>
>> Any chance the IRC chats can make it only the ML for posterity?
>>
>> Mike
>>
>> On Wed, Sep 24, 2014 at 12:04 PM, Keith Turner  wrote:
>>
>>  On Wed, Sep 24, 2014 at 12:44 PM, Russ Weeks
>>> wrote:
>>>
>>>  Interesting that "y" (0x79) and "9" (0x39) are one bit "away" from each
 other. I blame cosmic rays!

  It is interesting, and thats only half of the story.  Its been
>>> interesting
>>> chatting w/ Josh about this on irc and hearing about his findings.
>>>
>>>
>>>  On Wed, Sep 24, 2014 at 9:05 AM, Josh Elser

>>> wrote:
>>>
 The offending keys are:
>>>
>>> 389a85668b6ebf8e 2ff6:4a78 [] 1411499115242
>>>
>>> 3a10885b-d481-4d00-be00-0477e231ey65:8576b169:
>>> 0cd98965c9ccc1d0:ba15529e
>>>
>>>  The careful eye will notice that the UUID in the first component of
> the
> value has a different suffix than the next corrupt key/value (ends with
> "ey65" instead of "e965"). Fixing this in the Value and re-running the
>
 CRC

> makes it pass.
>
>
>   and
>
>> 7e56b58a0c7df128 5fa0:6249 [] 1411499311578
>>>
>>> 3a10885b-d481-4d00-be00-0477e231e965:p000872d60eb:
>>> 499fa72752d82a7c:5c5f19e8
>>>
>>>
>>>
>>

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-24 Thread Corey Nolet

Bill,

I've been having that same IT issue and said the same thing "It's not
happening to others". I lifted the timeout completely and it never finished.


On Wed, Sep 24, 2014 at 1:13 PM, Mike Drob  wrote:

> Any chance the IRC chats can make it only the ML for posterity?
>
> Mike
>
> On Wed, Sep 24, 2014 at 12:04 PM, Keith Turner  wrote:
>
> > On Wed, Sep 24, 2014 at 12:44 PM, Russ Weeks 
> > wrote:
> >
> > > Interesting that "y" (0x79) and "9" (0x39) are one bit "away" from each
> > > other. I blame cosmic rays!
> > >
> >
> > It is interesting, and thats only half of the story.  Its been
> interesting
> > chatting w/ Josh about this on irc and hearing about his findings.
> >
> >
> > >
> > > On Wed, Sep 24, 2014 at 9:05 AM, Josh Elser 
> > wrote:
> > >
> > > >
> > > >>> The offending keys are:
> > > >>>
> > > >>> 389a85668b6ebf8e 2ff6:4a78 [] 1411499115242
> > > >>>
> > > >>> 3a10885b-d481-4d00-be00-0477e231ey65:8576b169:
> > > >>> 0cd98965c9ccc1d0:ba15529e
> > > >>>
> > > >>
> > > > The careful eye will notice that the UUID in the first component of
> the
> > > > value has a different suffix than the next corrupt key/value (ends
> with
> > > > "ey65" instead of "e965"). Fixing this in the Value and re-running
> the
> > > CRC
> > > > makes it pass.
> > > >
> > > >
> > > >  and
> > > >>>
> > > >>> 7e56b58a0c7df128 5fa0:6249 [] 1411499311578
> > > >>>
> > > >>> 3a10885b-d481-4d00-be00-0477e231e965:p000872d60eb:
> > > >>> 499fa72752d82a7c:5c5f19e8
> > > >>>
> > > >>>
> > >
> >
>

Re: [DISCUSS] Thinking about branch names

2014-09-23 Thread Corey Nolet

+1

Using separate branches in this manner just adds complexity. I was
wondering myself why we needed to create separate branches when all we're
doing is tagging/deleting the already released ones. The only difference
between where one leaves off and another begins  is the name of the branch.


On Tue, Sep 23, 2014 at 9:04 AM, Christopher  wrote:

> +1 to static dev branch names per release series. (this would also fix the
> Jenkins spam when the builds break due to branch name changes)
>
> However, I kind of prefer 1.5.x or 1.5-dev, or similar, over simply 1.5,
> which looks so much like a release version that I wouldn't want it to
> generate any confusion.
>
> Also, for reference, here's a few git commands that might help some people
> avoid the situation that happened:
> git remote update
> git remote prune $(git remote)
> git config --global push.default current # git < 1.8
> git config --global push.default simple # git >= 1.8
>
> The situation seems to primarily have occurred because of some pushes that
> succeeded because the local clone was not aware that the remote branches
> had disappeared. Pruning will clean those up, so that you'll get an error
> if you try to push. Simple/current push strategy will ensure you don't push
> all matching branches by default. Josh's proposed solution makes it less
> likely the branches will disappear/change on a remote, but these are still
> useful git commands to be aware of, and are related enough to this
> situation, I thought I'd share.
>
>
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
> On Mon, Sep 22, 2014 at 11:18 PM, Josh Elser  wrote:
>
> > After working on 1.5.2 and today's branch snafu, I think I've come to the
> > conclusion that our branch naming is more pain than it's worth (I
> believe I
> > was the one who primarily argued for branch names as they are current
> > implemented, so take that as you want).
> >
> > * Trying to making a new branch for the "next" version as a release is
> > happening forces you to fight with Maven. Maven expects that your "next"
> is
> > going to be on the same branch and the way it makes commits and bumps
> > versions for you encourages this. Using a new branch for "next" is more
> > manual work for the release manager.
> >
> > * The time after we make a release, there's a bit of confusion (I do it
> > too, just not publicly... yet) about "what branch do I put this fix for
> > _version_ in?". It's not uncommon to put it in the "old" branch instead
> of
> > the new one. The problem arises when the old branch has already been
> > deleted. If a developer has an old version of that branch, there's
> nothing
> > to tell them "hey, your copy of this branch is behind the remote's copy
> of
> > this branch. I'm not accepting your push!" Having a single branch for a
> > release line removes this hassle.
> >
> > "Pictorially", I'm thinking we would change from the active branches
> > {1.5.3-SNAPSHOT, 1.6.1-SNAPSHOT, 1.6.2-SNAPSHOT, master} to {1.5, 1.6,
> > master}. (where a git tag would exist for the 1.6.1 RCs).
> >
> > IIRC, the big argument for per-release branches was of encouraging
> > frequent, targeted branches (I know the changes for this version go in
> this
> > branch). I think most of this can be mitigated by keeping up with
> frequent
> > releases and coordination with the individual cutting the release.
> >
> > In short, I'm of the opinion that I think we should drop the
> ".z-SNAPSHOT"
> > suffix from branch names (e.g. 1.5.3-SNAPSHOT) and move to a shorter
> "x.y"
> > (e.g. 1.5) that exists for the lifetime of that version. I think we could
> > also use this approach if/when we change our versioning to start using
> the
> > "x" component of "x.y.z".
> >
> > Thoughts?
> >
> > - Josh
> >
>

Re: Apache Storm Graduation to a TLP

2014-09-22 Thread Corey Nolet

Congrats!

On Mon, Sep 22, 2014 at 5:16 PM, P. Taylor Goetz  wrote:

> I’m pleased to announce that Apache Storm has graduated to a Top-Level
> Project (TLP), and I’d like to thank everyone in the Storm community for
> your contributions and help in achieving this important milestone.
>
> As part of the graduation process, a number of infrastructure changes have
> taken place:
>
> *New website url:* http://storm.apache.org
>
> *New git repo urls:*
>
> https://git-wip-us.apache.org/repos/asf/storm.git (for committer push)
>
> g...@github.com:apache/storm.git
> -or-
> https://github.com/apache/storm.git (for github pull requests)
>
> *Mailing Lists:*
> If you are already subscribed, you’re subscription has been migrated. New
> messages should be sent to the new address:
>
> [list]@storm.apache.org
>
> This includes any subscribe/unsubscribe requests.
>
> Note: The mail-archives.apache.org site will not reflect these changes
> until October 1.
>
>
> Most of these changes have already occurred and are seamless. Please
> update your git remotes and address books accordingly.
>
> - Taylor
>

Re: Apache Storm Graduation to a TLP

2014-09-22 Thread Corey Nolet

Congrats!

On Mon, Sep 22, 2014 at 5:16 PM, P. Taylor Goetz  wrote:

> I’m pleased to announce that Apache Storm has graduated to a Top-Level
> Project (TLP), and I’d like to thank everyone in the Storm community for
> your contributions and help in achieving this important milestone.
>
> As part of the graduation process, a number of infrastructure changes have
> taken place:
>
> *New website url:* http://storm.apache.org
>
> *New git repo urls:*
>
> https://git-wip-us.apache.org/repos/asf/storm.git (for committer push)
>
> g...@github.com:apache/storm.git
> -or-
> https://github.com/apache/storm.git (for github pull requests)
>
> *Mailing Lists:*
> If you are already subscribed, you’re subscription has been migrated. New
> messages should be sent to the new address:
>
> [list]@storm.apache.org
>
> This includes any subscribe/unsubscribe requests.
>
> Note: The mail-archives.apache.org site will not reflect these changes
> until October 1.
>
>
> Most of these changes have already occurred and are seamless. Please
> update your git remotes and address books accordingly.
>
> - Taylor
>

Re: [VOTE] Apache Accumulo 1.6.1 RC1

2014-09-22 Thread Corey Nolet

Yeah I'll push it tonight.

On Mon, Sep 22, 2014 at 4:28 PM, Josh Elser  wrote:

> This appears to have been a snafu (related to the push-screwup). I'll try
> to restore if I have the branch locally, but you might have to re-push your
> branch, Corey (or anyone else who has the SHA1 listed in his original VOTE
> email).
>
>
> On 9/22/14, 1:26 PM, Josh Elser wrote:
>
>> Corey, I don't see the branch. Did you forget to push?
>>
>> On 9/19/14, 10:49 PM, Corey Nolet wrote:
>>
>>> Devs,
>>>
>>> Please consider the following candidate for Apache Accumulo 1.6.1
>>>
>>> Branch: 1.6.1-rc1
>>> SHA1: 88c5473b3b49d797d3dabebd12fe517e9b248ba2
>>> Staging Repository:
>>> *https://repository.apache.org/content/repositories/
>>> orgapacheaccumulo-1017/
>>>
>>> <https://repository.apache.org/content/repositories/
>>> orgapacheaccumulo-1017/>*
>>>
>>>
>>> Source tarball:
>>> *http://repository.apache.org/content/repositories/
>>> orgapacheaccumulo-1017/org/apache/accumulo/accumulo/1.6.
>>> 1/accumulo-1.6.1-src.tar.gz
>>>
>>> <http://repository.apache.org/content/repositories/
>>> orgapacheaccumulo-1017/org/apache/accumulo/accumulo/1.6.
>>> 1/accumulo-1.6.1-src.tar.gz>*
>>>
>>> Binary tarball:
>>> *http://repository.apache.org/content/repositories/
>>> orgapacheaccumulo-1017/org/apache/accumulo/accumulo/1.6.
>>> 1/accumulo-1.6.1-bin.tar.gz
>>>
>>> <http://repository.apache.org/content/repositories/
>>> orgapacheaccumulo-1017/org/apache/accumulo/accumulo/1.6.
>>> 1/accumulo-1.6.1-bin.tar.gz>*
>>>
>>> (Append ".sha1", ".md5" or ".asc" to download the signature/hash for a
>>> given artifact.)
>>>
>>> Signing keys available at: https://www.apache.org/dist/accumulo/KEYS
>>>
>>> Over 1.6.1, we have 188 issues resolved
>>> *https://git-wip-us.apache.org/repos/asf?p=accumulo.git;
>>> a=blob;f=CHANGES;h=91b9d31e3b9dc53f1a576cc49bbc061919eb0070;hb=1.6.1-rc1
>>>
>>> <https://git-wip-us.apache.org/repos/asf?p=accumulo.git;
>>> a=blob;f=CHANGES;h=91b9d31e3b9dc53f1a576cc49bbc061919eb0070;hb=1.6.1-rc1
>>> >*
>>>
>>>
>>> Testing: All unit and functional tests are passing.
>>>
>>> Vote will be open until Thursday, September 25th 12:00AM UTC (9/24 8:00PM
>>> ET, 9/24 5:00PM PT)
>>>
>>>

[VOTE] Apache Accumulo 1.6.1 RC1

2014-09-19 Thread Corey Nolet

Devs,

Please consider the following candidate for Apache Accumulo 1.6.1

Branch: 1.6.1-rc1
SHA1: 88c5473b3b49d797d3dabebd12fe517e9b248ba2
Staging Repository:
*https://repository.apache.org/content/repositories/orgapacheaccumulo-1017/
*

Source tarball:
*http://repository.apache.org/content/repositories/orgapacheaccumulo-1017/org/apache/accumulo/accumulo/1.6.1/accumulo-1.6.1-src.tar.gz
*
Binary tarball:
*http://repository.apache.org/content/repositories/orgapacheaccumulo-1017/org/apache/accumulo/accumulo/1.6.1/accumulo-1.6.1-bin.tar.gz
*
(Append ".sha1", ".md5" or ".asc" to download the signature/hash for a
given artifact.)

Signing keys available at: https://www.apache.org/dist/accumulo/KEYS

Over 1.6.1, we have 188 issues resolved
*https://git-wip-us.apache.org/repos/asf?p=accumulo.git;a=blob;f=CHANGES;h=91b9d31e3b9dc53f1a576cc49bbc061919eb0070;hb=1.6.1-rc1
*

Testing: All unit and functional tests are passing.

Vote will be open until Thursday, September 25th 12:00AM UTC (9/24 8:00PM
ET, 9/24 5:00PM PT)

Re: [VOTE] Apache Accumulo 1.5.2 RC1

2014-09-18 Thread Corey Nolet

If we are concerned with confusion about adoption of new versions, we
should make a point to articulate the purpose very clearly in each of the
announcements. I was in the combined camp an hour ago and now I'm also
thinking we should keep them separate.


On Fri, Sep 19, 2014 at 1:16 AM, Josh Elser  wrote:

> No we did not bundle any release announcements prior. I also have to agree
> with Bill -- I don't really see how there would be confusion with a
> properly worded announcement.
>
> Happy to work with anyone who has concerns in this regard to come up with
> something that is agreeable. I do think they should be separate.
>
>
> On 9/19/14, 1:02 AM, Mike Drob wrote:
>
>> Did we bundle 1.5.1/1.6.0? If not, they were fairly close together, I
>> think. Historically, we have not done a great job of distinguishing our
>> release lines, so that has led to confusion. Maybe I'm on the path to
>> talking myself out of a combined announcement here.
>>
>> On Thu, Sep 18, 2014 at 9:57 PM, William Slacum <
>> wilhelm.von.cl...@accumulo.net> wrote:
>>
>>  Not to be a total jerk, but what's unclear about 1.5 < 1.6? Lots of
>>> projects have multiple release lines and it's not an issue.
>>>
>>> On Fri, Sep 19, 2014 at 12:18 AM, Mike Drob  wrote:
>>>
>>>  +1 to combining. I've already had questions about upgrading to "this

>>> latest
>>>
 release" from somebody currently on the 1.6 line. Our release narrative

>>> is
>>>
 not clear and we should not muddle the waters.

 On Thu, Sep 18, 2014 at 7:27 PM, Christopher 

>>> wrote:
>>>

  Should we wait to do a release announcement until 1.6.1, so we can
>
 batch
>>>
 the two?
>
> My main concern here is that I don't want to encourage new 1.5.x
>
 adoption
>>>
 when we have 1.6.x, and having two announcements could be confusing to
>
 new

> users who aren't sure which version to start using. We could issue an
> announcement that primarily mentions 1.6.1, and also mentions 1.5.2
>
 second.

> That way, people will see 1.6.x as the stable/focus release, but will
>
 still

> inform 1.5.x users of updates.
>
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
> On Thu, Sep 18, 2014 at 10:20 PM, Josh Elser 
>
 wrote:

>
>  Vote passes with 3 +1's and nothing else. Huge thank you to those who
>>
> made
>
>> the time to participate.
>>
>> I'll finish up the rest of the release work tonight.
>>
>> On 9/15/14, 12:24 PM, Josh Elser wrote:
>>
>>  Devs,
>>>
>>> Please consider the following candidate for Apache Accumulo 1.5.2
>>>
>>> Tag: 1.5.2rc1
>>> SHA1: 039a2c28bdd474805f34ee33f138b009edda6c4c
>>> Staging Repository:
>>> https://repository.apache.org/content/repositories/
>>> orgapacheaccumulo-1014/
>>>
>>> Source tarball:
>>> http://repository.apache.org/content/repositories/
>>> orgapacheaccumulo-1014/org/apache/accumulo/accumulo/1.5.
>>> 2/accumulo-1.5.2-src.tar.gz
>>>
>>> Binary tarball:
>>> http://repository.apache.org/content/repositories/
>>> orgapacheaccumulo-1014/org/apache/accumulo/accumulo/1.5.
>>> 2/accumulo-1.5.2-bin.tar.gz
>>>
>>> (Append ".sha1", ".md5" or ".asc" to download the signature/hash
>>>
>> for a
>>>
 given artifact.)
>>>
>>> Signing keys available at:
>>>
>> https://www.apache.org/dist/accumulo/KEYS
>>>

>>> Over 1.5.1, we have 109 issues resolved
>>> https://git-wip-us.apache.org/repos/asf?p=accumulo.git;a=
>>> blob;f=CHANGES;h=c2892d6e9b1c6c9b96b2a58fc901a76363ece8b0;hb=
>>> 039a2c28bdd474805f34ee33f138b009edda6c4c
>>>
>>>
>>> Testing: all unit and functional tests are passing and ingested 1B
>>> entries using CI w/ agitation over rc0.
>>>
>>> Vote will be open until Friday, August 19th 12:00AM UTC (8/18 8:00PM
>>>
>> ET,

> 8/18 5:00PM PT)
>>>
>>> - Josh
>>>
>>>
>>
>

>>>
>>

Re: AccumuloMultiTableInputFormat IllegalStatementException

2014-09-17 Thread Corey Nolet

Have you looked at the libjars? The uber jar is one approach, but it can
become very ugly very quickly.

http://grepalex.com/2013/02/25/hadoop-libjars/

On Wed, Sep 17, 2014 at 10:59 PM, JavaHokie 
wrote:

> Hi Corey,
>
> I am now trying to deploy this @ work and I am unable to get this run
> without putting accumulo-core, accumulo-fate, accumulo-trace,
> accumulo-tracer, and accumulo-tserver in the
> $HADOOP_COMMON_HOME/share/hadoop/common directory.  Can you tell me how you
> package your jar to obviate the need to put these jars here?
>
> Thanks
>
> --John
>
> On Sun, Aug 24, 2014 at 6:50 PM, Corey Nolet-2 [via Apache Accumulo] <[hidden
> email] <http://user/SendEmail.jtp?type=node&node=11303&i=0>> wrote:
>
>> Awesome John! It's good to have this documented for future users. Keep us
>> updated!
>>
>>
>> On Sun, Aug 24, 2014 at 11:05 AM, JavaHokie <[hidden email]
>> <http://user/SendEmail.jtp?type=node&node=11209&i=0>> wrote:
>>
>>> Hi Corey,
>>>
>>> Just to wrap things up, AccumuloMultipeTableInputFormat is working really
>>> well.  This is an outstanding feature I can leverage big-time on my
>>> current
>>> work assignment, an IRAD I am working on, as well as my own prototype
>>> project.
>>>
>>> Thanks again for your help!
>>>
>>> --John
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-accumulo.1065345.n5.nabble.com/AccumuloMultiTableInputFormat-IllegalStateException-tp11186p11208.html
>>> Sent from the Users mailing list archive at Nabble.com.
>>>
>>
>>
>>
>> --
>>  If you reply to this email, your message will be added to the
>> discussion below:
>>
>> http://apache-accumulo.1065345.n5.nabble.com/AccumuloMultiTableInputFormat-IllegalStateException-tp11186p11209.html
>>  To unsubscribe from AccumuloMultiTableInputFormat IllegalStateException, 
>> click
>> here.
>> NAML
>> <http://apache-accumulo.1065345.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
> --
> View this message in context: Re: AccumuloMultiTableInputFormat
> IllegalStatementException
> <http://apache-accumulo.1065345.n5.nabble.com/AccumuloMultiTableInputFormat-IllegalStateException-tp11186p11303.html>
>
> Sent from the Users mailing list archive
> <http://apache-accumulo.1065345.n5.nabble.com/Users-f2.html> at
> Nabble.com.
>

Re: Decouple topology configuration from code

2014-09-16 Thread Corey Nolet

Also, Trident is a DSL for rapidly producing useful analytics in Storm and
I've been working on a DSL that makes streams processing for complex event
processing possible.

That one is located here:

https://github.com/calrissian/flowmix
On Sep 16, 2014 4:29 AM,  wrote:

>  Hi folks,
>
>
>
> Apache Camel has a number of DSL which allow its topologies (routes wrt.
> Camel terminology) to be set up and configured easily.
>
> I am interested in such approach for Storm.
>
> I found java beans usage in:
>
> https://github.com/granthenke/storm-spring/
>
> but sounds fairly limited to me.
>
>
>
> Is there any other DSL like initiative for Storm ?
>
>
>
> My second concern is storm cluster management: we’d like to have a
> registry of topologies and be able to
> register/destroy/launch/suspend/kill/update registered topologies
> using a REST API.
>
>
>
> Is there any tool/initiative to support that ?
>
>
>
> Thx,
>
>
>
> /DV
>
>
>
> *Dominique Villard*
>
> *Architecte logiciel / Lead Developer*
> Orange/OF/DTSI/DSI/DFY/SDFY
>
> *tél. 04 97 46 30 03*
> dominique.vill...@orange.com 
>
>
>
>
>
> _
>
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou 
> falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been 
> modified, changed or falsified.
> Thank you.
>
>

Re: Decouple topology configuration from code

2014-09-16 Thread Corey Nolet

Awhile ago I had written a camel adapter for storm so that spout inputs
could come from camel. Not sure how useful it would be for you but its
located here:

https://github.com/calrissian/storm-recipes/blob/master/camel/src/main/java/org/calrissian/recipes/camel/spout/CamelConsumerSpout.java

Hi folks,

Apache Camel has a number of DSL which allow its topologies (routes wrt.
Camel terminology) to be set up and configured easily.

I am interested in such approach for Storm.

I found java beans usage in:

https://github.com/granthenke/storm-spring/

but sounds fairly limited to me.

Is there any other DSL like initiative for Storm ?

My second concern is storm cluster management: we’d like to have a registry
of topologies and be able to
register/destroy/launch/suspend/kill/update registered topologies
using a REST API.

Is there any tool/initiative to support that ?

Thx,

/DV

*Dominique Villard*

*Architecte logiciel / Lead Developer*
Orange/OF/DTSI/DSI/DFY/SDFY

*tél. 04 97 46 30 03*
dominique.vill...@orange.com

Ce message et ses pieces jointes peuvent contenir des informations
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez
recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les
messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere,
deforme ou falsifie. Merci.

This message and its attachments may contain confidential or
privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and
delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have
been modified, changed or falsified.
Thank you.

Re: Time to release 1.6.1?

2014-09-11 Thread Corey Nolet

I'm on it. I'll get a more formal vote going after I dig through the jira a
bit and note what's changed.

On Thu, Sep 11, 2014 at 11:06 AM, Christopher  wrote:

> Also, we can always have a 1.6.2 if there's outstanding bugfixes to release
> later.
>
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
> On Thu, Sep 11, 2014 at 10:36 AM, Eric Newton 
> wrote:
>
> > +1 for 1.6.1.
> >
> > There are people testing a recent 1.6 branch at scale (100s of nodes),
> with
> > the intent of pushing it to production.
> >
> > I would rather have a released version in production.
> >
> > Thanks for volunteering.  Feel free to contact me if you need a hand with
> > anything.
> >
> > -Eric
> >
> >
> > On Wed, Sep 10, 2014 at 1:49 PM, Josh Elser 
> wrote:
> >
> > > Sure that's fine, Corey. Happy to help coordinate things with you.
> > > *Hopefully* it's not too painful :)
> > >
> > >
> > > On 9/10/14, 10:43 AM, Corey Nolet wrote:
> > >
> > >> I had posted this to the mailing list originally after a discussion
> with
> > >> Christopher at the Accumulo Summit hack-a-thon and because I wanted to
> > get
> > >> into the release process to help out.
> > >>
> > >> Josh, I still wouldn't mind getting together 1.6.1 if that's okay with
> > >> you.
> > >> If nothing else, it would get someone else following the procedures
> and
> > >> able to do the release.
> > >>
> > >> On Wed, Sep 10, 2014 at 1:22 PM, Josh Elser 
> > wrote:
> > >>
> > >>  That's exactly my plan, Christopher. Keith has been the man working
> on
> > a
> > >>> fix for ACCUMULO-1628 which is what I've been spinning on to get
> 1.5.2
> > >>> out
> > >>> the door. I want to spend a little time today looking at his patch to
> > >>> understand the fix and run some tests myself. Hopefully John can
> retest
> > >>> the
> > >>> patch as well since he had an environment that could reproduce the
> bug.
> > >>>
> > >>> Right after we get 1.5.2, I'm happy to work on 1.6.1 as well.
> > >>>
> > >>> - Josh
> > >>>
> > >>>
> > >>> On 9/10/14, 10:04 AM, Christopher wrote:
> > >>>
> > >>>  Because of ACCUMULO-2988 (upgrade path from 1.4.x --> 1.6.y, y >=
> 1),
> > >>>> I'm
> > >>>> hoping we can revisit this soon. Maybe get 1.5.2 out the door,
> > followed
> > >>>> by
> > >>>> 1.6.1 right away.
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Christopher L Tubbs II
> > >>>> http://gravatar.com/ctubbsii
> > >>>>
> > >>>> On Fri, Jun 20, 2014 at 10:30 AM, Keith Turner 
> > >>>> wrote:
> > >>>>
> > >>>>   On Thu, Jun 19, 2014 at 11:46 AM, Josh Elser <
> josh.el...@gmail.com>
> > >>>>
> > >>>>> wrote:
> > >>>>>
> > >>>>>   I was thinking the same thing, but I also haven't made any
> strides
> > >>>>>
> > >>>>>>
> > >>>>>>  towards
> > >>>>>
> > >>>>>  getting 1.5.2 closer to happening (as I said I'd try to do).
> > >>>>>>
> > >>>>>> I still lack "physical" resources to do the week-long testing as
> our
> > >>>>>> guidelines currently force us to do. I still think this testing is
> > >>>>>> excessive if we're actually releasing bug-fixes, but it does
> > >>>>>>
> > >>>>>>  differentiate
> > >>>>>
> > >>>>>  us from other communities.
> > >>>>>>
> > >>>>>>
> > >>>>>>  I want to run some CI test because of the changes I made w/
> walog.
> > >>>>> I can
> > >>>>> run the test, but I would like to do that as late as possible.
>  Just
> > >>>>> let
> > >>>>> me know when you are thinking of cutting a release.
> > >>>>>
> > >>>>> Also, I would like to get 2827 in for the release.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>  I'm really not sure how to approach this which is really why I've
> > been
> > >>>>>> stalling on it.
> > >>>>>>
> > >>>>>>
> > >>>>>> On 6/19/14, 7:18 AM, Mike Drob wrote:
> > >>>>>>
> > >>>>>>   I'd like to see 1.5.2 released first, just in case there are
> > issues
> > >>>>>> we
> > >>>>>>
> > >>>>>>> discover during that process that need to be addressed. Also, I
> > think
> > >>>>>>> it
> > >>>>>>> would be useful to resolve the discussion surrounding upgrades[1]
> > >>>>>>> before
> > >>>>>>> releasing.
> > >>>>>>>
> > >>>>>>> [1]:
> > >>>>>>> http://mail-archives.apache.org/mod_mbox/accumulo-dev/
> > >>>>>>> 201406.mbox/%3CCAGHyZ6LFuwH%3DqGF9JYpitOY9yYDG-
> > >>>>>>> sop9g6iq57VFPQRnzmyNQ%40mail.gmail.com%3E
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Thu, Jun 19, 2014 at 8:09 AM, Corey Nolet 
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>I'd like to start getting a candidate together if there are no
> > >>>>>>>
> > >>>>>>>  objections.
> > >>>>>>>>
> > >>>>>>>> It looks like we have 65 resolved tickets with a fix version of
> > >>>>>>>> 1.6.1.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>
> > >>
> >
>

Re: Time to release 1.6.1?

2014-09-10 Thread Corey Nolet

I had posted this to the mailing list originally after a discussion with
Christopher at the Accumulo Summit hack-a-thon and because I wanted to get
into the release process to help out.

Josh, I still wouldn't mind getting together 1.6.1 if that's okay with you.
If nothing else, it would get someone else following the procedures and
able to do the release.

On Wed, Sep 10, 2014 at 1:22 PM, Josh Elser  wrote:

> That's exactly my plan, Christopher. Keith has been the man working on a
> fix for ACCUMULO-1628 which is what I've been spinning on to get 1.5.2 out
> the door. I want to spend a little time today looking at his patch to
> understand the fix and run some tests myself. Hopefully John can retest the
> patch as well since he had an environment that could reproduce the bug.
>
> Right after we get 1.5.2, I'm happy to work on 1.6.1 as well.
>
> - Josh
>
>
> On 9/10/14, 10:04 AM, Christopher wrote:
>
>> Because of ACCUMULO-2988 (upgrade path from 1.4.x --> 1.6.y, y >= 1), I'm
>> hoping we can revisit this soon. Maybe get 1.5.2 out the door, followed by
>> 1.6.1 right away.
>>
>>
>> --
>> Christopher L Tubbs II
>> http://gravatar.com/ctubbsii
>>
>> On Fri, Jun 20, 2014 at 10:30 AM, Keith Turner  wrote:
>>
>>  On Thu, Jun 19, 2014 at 11:46 AM, Josh Elser 
>>> wrote:
>>>
>>>  I was thinking the same thing, but I also haven't made any strides
>>>>
>>> towards
>>>
>>>> getting 1.5.2 closer to happening (as I said I'd try to do).
>>>>
>>>> I still lack "physical" resources to do the week-long testing as our
>>>> guidelines currently force us to do. I still think this testing is
>>>> excessive if we're actually releasing bug-fixes, but it does
>>>>
>>> differentiate
>>>
>>>> us from other communities.
>>>>
>>>>
>>> I want to run some CI test because of the changes I made w/ walog.  I can
>>> run the test, but I would like to do that as late as possible.   Just let
>>> me know when you are thinking of cutting a release.
>>>
>>> Also, I would like to get 2827 in for the release.
>>>
>>>
>>>
>>>> I'm really not sure how to approach this which is really why I've been
>>>> stalling on it.
>>>>
>>>>
>>>> On 6/19/14, 7:18 AM, Mike Drob wrote:
>>>>
>>>>  I'd like to see 1.5.2 released first, just in case there are issues we
>>>>> discover during that process that need to be addressed. Also, I think
>>>>> it
>>>>> would be useful to resolve the discussion surrounding upgrades[1]
>>>>> before
>>>>> releasing.
>>>>>
>>>>> [1]:
>>>>> http://mail-archives.apache.org/mod_mbox/accumulo-dev/
>>>>> 201406.mbox/%3CCAGHyZ6LFuwH%3DqGF9JYpitOY9yYDG-
>>>>> sop9g6iq57VFPQRnzmyNQ%40mail.gmail.com%3E
>>>>>
>>>>>
>>>>> On Thu, Jun 19, 2014 at 8:09 AM, Corey Nolet 
>>>>> wrote:
>>>>>
>>>>>   I'd like to start getting a candidate together if there are no
>>>>>
>>>>>> objections.
>>>>>>
>>>>>> It looks like we have 65 resolved tickets with a fix version of 1.6.1.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>

Re: Tablet server thrift issue

2014-09-01 Thread Corey Nolet

As an update,

I raised the tablet server memory and I have not seen this error thrown
since. I'd like to say raising the memory, alone, was the solution but it
appears that I also may be having some performance issues with the switches
connecting the racks together. I'll update more as I dive in further.


On Fri, Aug 22, 2014 at 11:41 PM, Corey Nolet  wrote:

> Josh,
>
> Your advice is definitely useful- I also thought about catching the
> exception and retrying with a fresh batch writer but the fact that the
> batch writer failure doesn't go away without being re-instantiated is
> really only a nuisance. The TabletServerBatchWriter could be designed much
> better, I agree, but that is not the root of the problem.
>
> The Thrift exception that is causing the issue is what I'd like to get to
> the bottom of. It's throwing the following:
>
> *TApplicationException: applyUpdates failed: out of sequence response *
>
> I've never seen this exception before in regular use of the client API-
> but I also just updated to 1.6.0. Google isn't showing anything useful for
> how exactly this exception could come about other than using a bad
> threading model- and I don't see any drastic changes or other user
> complaints on the mailing list that would validate that line of thought.
> Quite frankly, I'm stumped. This could be a Thrift exception related to a
> Thrift bug or something bad on my system and have nothing to do with
> Accumulo.
>
> Chris Tubbs mentioned to me earlier that he recalled Keith and Eric had
> seen the exception before and may remember what it was/how they fixed it.
>
>
> On Fri, Aug 22, 2014 at 10:58 PM, Josh Elser  wrote:
>
>> Don't mean to tell you that I don't think there might be a bug/otherwise,
>> that's pretty much just the limit of what I know about the server-side
>> sessions :)
>>
>> If you have concrete "this worked in 1.4.4" and "this happens instead
>> with 1.6.0", that'd make a great ticket :D
>>
>> The BatchWriter failure case is pretty rough, actually. Eric has made
>> some changes to help already (in 1.6.1, I think), but it needs an overhaul
>> that I haven't been able to make time to fix properly, either. IIRC, the
>> only guarantee you have is that all mutations added before the last flush()
>> happened are durable on the server. Anything else is a guess. I don't know
>> the specifics, but that should be enough to work with (and saving off
>> mutations shouldn't be too costly since they're stored serialized).
>>
>>
>> On 8/22/14, 5:44 PM, Corey Nolet wrote:
>>
>>> Thanks Josh,
>>>
>>> I understand about the session ID completely but the problem I have is
>>> that
>>> the exact same client code worked, line for line, just fine in 1.4.4 and
>>> it's acting up in 1.6.0. I also seem to remember the BatchWriter
>>> automatically creating a new session when one expired without an
>>> exception
>>> causing it to fail on the client.
>>>
>>> I know we've made changes since 1.4.4 but I'd like to troubleshoot the
>>> actual issue of the BatchWriter failing due to the thrift exception
>>> rather
>>> than just catching the exception and trying mutations again. The other
>>> issue is that I've already submitted a bunch of mutations to the batch
>>> writer from different threads. Does that mean I need to be storing them
>>> off
>>> twice? (once in the BatchWriter's cache and once in my own)
>>>
>>> The BatchWriter in my ingester is constantly sending data and the tablet
>>> servers have been given more than enough memory to be able to keep up.
>>> There's no swap being used and the network isn't experiencing any errors.
>>>
>>>
>>> On Fri, Aug 22, 2014 at 4:54 PM, Josh Elser 
>>> wrote:
>>>
>>>  If you get an error from a BatchWriter, you pretty much have to throw
>>>> away
>>>> that instance of the BatchWriter and make a new one. See ACCUMULO-2990.
>>>> If
>>>> you want, you should be able to catch/recover from this without having
>>>> to
>>>> restart the ingester.
>>>>
>>>> If the session ID is invalid, my guess is that it hasn't been used
>>>> recently and the tserver cleaned it up. The exception logic isn't the
>>>> greatest (as it just is presented to you as a RTE).
>>>>
>>>> https://issues.apache.org/jira/browse/ACCUMULO-2990
>>>>
>>>&g

Re: AccumuloMultiTableInputFormat IllegalStatementException

2014-08-24 Thread Corey Nolet

Awesome John! It's good to have this documented for future users. Keep us
updated!


On Sun, Aug 24, 2014 at 11:05 AM, JavaHokie 
wrote:

> Hi Corey,
>
> Just to wrap things up, AccumuloMultipeTableInputFormat is working really
> well.  This is an outstanding feature I can leverage big-time on my current
> work assignment, an IRAD I am working on, as well as my own prototype
> project.
>
> Thanks again for your help!
>
> --John
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/AccumuloMultiTableInputFormat-IllegalStateException-tp11186p11208.html
> Sent from the Users mailing list archive at Nabble.com.
>

Re: AccumuloMultiTableInputFormat IllegalStatementException

2014-08-24 Thread Corey Nolet

I'm thinking this could be a yarn.application.classpath configuration
problem in your yarn-site.xml. I meant to ask earlier- how are you building
your jar that gets deployed? Are you shading it? Using libjars?



On Sun, Aug 24, 2014 at 6:56 AM, JavaHokie 
wrote:

> Hey Corey,
>
> Yah, sometimes ya just gotta go to the source code. :)
>
> It's a weird exception message...I am used to seeing NoClassDefFoundError
> and ClassNotFoundException.  It's also weird that the ReflectionException
> is
> no thrown, with NoClassDefFoundError or ClassNotFoundException as the root
> exception.
>
> Anyways, it's a classpath deal, but it's a weird one.  I thought maybe I
> had
> a 1.5 jar around somewhere, but the fact that the InputConfigurator--also
> new in Accumulo 1.6--can be found but InputTableConfig cannot is a bit
> puzzling.  But...us Java developers are used to figuring out classpath
> problems. Currently researching it on my end.
>
> Thanks again for all of your help so far--again, much appreciated.  Really
> excited to use this new feature.
>
> --John
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/AccumuloMultiTableInputFormat-IllegalStateException-tp11186p11204.html
> Sent from the Users mailing list archive at Nabble.com.
>

Re: AccumuloMultiTableInputFormat IllegalStatementException

2014-08-23 Thread Corey Nolet

Awesome! I was going to recommend checking out the code last night so that
you could put some logging statements in there. You've probably noticed
this already but the MapWritable does not have static type parameters so it
dumps out the fully qualified class name so that it can instantiate it back
using readFields() when it's deserializing.

That error is happening when the reflection is occurring- though it doesn't
make much sense. The Accumulo mapreduce packages are obviously on the
classpath. If you are still having this issue, I'll keep looking more into
this as well.


On Aug 23, 2014 2:37 PM, "JavaHokie"  wrote:

> I checked out 1.6.0 from git and updated the exception handling for the
> getInputTableConfigs method, rebuilt, and tested my M/R jobs that use
> Accumulo as a source or sink just to ensure everything is still working
> correctly.
>
> I then updated the InputConfigurator.getInputTableConfig exception handling
> and I see the root cause is as follows:
>
> java.io.IOException: can't find class:
> org.apache.accumulo.core.client.mapreduce.InputTableConfig because
> org.apache.accumulo.core.client.mapreduce.InputTableConfig
> at
>
> org.apache.hadoop.io.AbstractMapWritable.readFields(AbstractMapWritable.java:212)
> at
> org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:169)
> at
>
> org.apache.accumulo.core.client.mapreduce.lib.impl.InputConfigurator.getInputTableConfigs(InputConfigurator.java:563)
> at
>
> org.apache.accumulo.core.client.mapreduce.lib.impl.InputConfigurator.validateOptions(InputConfigurator.java:644)
> at
>
> org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.validateOptions(AbstractInputFormat.java:342)
> at
>
> org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.getSplits(AbstractInputFormat.java:537)
> at
>
> org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:491)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:508)
> at
>
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)
>
> The IOException can't find class  because  is a new
> one for me, but at least I have something specific to research.
>
> --John
>
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/AccumuloMultiTableInputFormat-IllegalStateException-tp11186p11202.html
> Sent from the Users mailing list archive at Nabble.com.
>

Re: AccumuloMultiTableInputFormat IllegalStatementException

2014-08-22 Thread Corey Nolet

That code I posted should be able to validate where you are getting hung
up. Can you try running that on the machine and seeing if it prints the
expected tables/ranges?

Also, are you running the job live? What does the configuration look like
for the job on your resource manager? Can you see if the base64 matches?

On Sat, Aug 23, 2014 at 1:47 AM, JavaHokie 
wrote:

> H...the byte[] array is generated OK.
>
> byte[] bytes =
> Base64.decodeBase64(configString.getBytes(StandardCharsets.UTF_8));
>
> I wonder what's golng wrong with one of these lines below?
>
> ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
> mapWritable.readFields(new DataInputStream(bais));
>
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/AccumuloMultiTableInputFormat-IllegalStateException-tp11186p11199.html
> Sent from the Users mailing list archive at Nabble.com.
>

Re: AccumuloMultiTableInputFormat IllegalStatementException

2014-08-22 Thread Corey Nolet

Sure. I was able to deserialize the base64 that you posted earlier and it
looks fine. The code I used to do this was like this:

byte[] serialized =
"AQEAOm9yZy5hcGFjaGUuYWNjdW11bG8uY29yZS5jbGllbnQubWFwcmVkdWNlLklucHV0VGFibGVDb25maWcCjAlmb2xsb3dpbmcBAAIAAAYGBgYxMDQ1ODeIf/8ABwcHBzEwNDU4NwCIf/8AAQYGBgYxMDUyNTWIf/8ABwcHBzEwNTI1NQCIf/8AAQAAAQAAjAx0d2l0dGVyZWRnZXMBAAIAAAYGBgYxMDQ1ODeIf/8ABwcHBzEwNDU4NwCIf/8AAQYGBgYxMDUyNTWIf/8ABwcHBzEwNTI1NQCIf/8AAQAAAQAA".getBytes(
Constants.UTF8);

byte[] decoded = Base64.decodeBase64(serialized);

MapWritable mapWritable = new MapWritable();
ByteArrayInputStream bais = new ByteArrayInputStream(decoded);
mapWritable.readFields(new DataInputStream(bais));
bais.close();

for(Map.Entry entry : mapWritable.entrySet()) {
  InputTableConfig config = (InputTableConfig)entry.getValue();
  log.debug(entry.getKey() + " - " + config.getRanges());
}

That IllegalStateException would not be getting thrown if the contents of
the input table config key in the configuration was null.


On Sat, Aug 23, 2014 at 1:31 AM, JavaHokie 
wrote:

> Agreed, should have used getConf(), I cleaned that up, so now things look
> like this:
>
> Job job = Job.getInstance(getConf());
>
> /*
>  * Set the basic stuff
>  */
> job.setJobName("TwitterJoin Query");
> job.setJarByClass(TwitterJoin.class);
>
> /*
>  * Set Mapper and Reducer Classes
>  */
> job.setMapperClass(TwitterJoinMapper.class);
> job.setReducerClass(TwitterJoinReducer.class);
>
> /*
>  * Set the Mapper MapOutputKeyClass and MapOutputValueClass
>  */
> job.setMapOutputKeyClass(Text.class);
> job.setMapOutputValueClass(Text.class);
>
> /*
>  * Set the Reducer OutputKeyClass and OutputValueClass
>  */
> job.setOutputKeyClass(Text.class);
> job.setOutputValueClass(Mutation.class);
>
> /*
>  * Set InputFormat and OutputFormat classes
>  */
>
> job.setInputFormatClass(AccumuloMultiTableInputFormat.class);
> job.setOutputFormatClass(AccumuloOutputFormat.class);
>
> /*
>  * Configure InputFormat and OutputFormat Classes
>  */
> Map configs = new
> HashMap();
>
> List ranges = Lists.newArrayList(new
> Range("104587"),new
> Range("105255"));
>
> InputTableConfig edgeConfig = new InputTableConfig();
> edgeConfig.setRanges(ranges);
> edgeConfig.setAutoAdjustRanges(true);
>
> InputTableConfig followerConfig = new InputTableConfig();
> followerConfig.setRanges(ranges);
> followerConfig.setAutoAdjustRanges(true);
>
> configs.put("following",followerConfig);
> configs.put("twitteredges",edgeConfig);
>
>
> AccumuloMultiTableInputFormat.setConnectorInfo(job,"root",new
> PasswordToken("!yost8932!".getBytes()));
>
>
> AccumuloMultiTableInputFormat.setZooKeeperInstance(job,"localhost","localhost");
> AccumuloMultiTableInputFormat.setScanAuthorizations(job,new
> Authorizations("private"));
> AccumuloMultiTableInputFormat.setInputTableConfigs(job,
> configs);
>
>
>
> log.debug(job.getConfiguration().get("AccumuloInputFormat.ScanOpts.TableConfigs"));
>
>
> AccumuloOutputFormat.setZooKeeperInstance(job,"localhost","localhost");
> AccumuloOutputFormat.setConnectorInfo(job,"root",new
> PasswordToken("!yost8932!".getBytes()));
> AccumuloOutputFormat.setCreateTables(job,true);
>
> AccumuloOutputFormat.setDefaultTableName(job,"twitteredgerollup");
>
> /*
>  * Kick off the job, wait for completion, and return
> applicable code
>  */
> boolean success = job.waitForCompletion(true);
>
> if (success) {
> return 0;
> }
>
> return 1;
>
> Still getting java.lang.IllegalStateException: The table query
> configurations could not be deserialized from the given configuration
>
> --John
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/AccumuloMultiTableInputFormat-IllegalStateException-tp11186p11197.html
> Sent from the Users mailing list archive at Nabble.com.
>

Re: AccumuloMultiTableInputFormat IllegalStatementException

2014-08-22 Thread Corey Nolet

Also, if you don't mind me asking, why isn't your job setup class extending
Configured? That was you are picking up configurations injected from the
environment.

You would do "MyJobSetUpClass extends Configured" Then use getConf()
instead of newing up a new configuration.


On Sat, Aug 23, 2014 at 1:11 AM, Corey Nolet  wrote:

> Job.getInstance(configuration) copies the configuration and makes its own.
> Try doing your debug statement from earlier on job.getConfiguration() and
> let's see what the base64 string looks like.
>
>
>
> On Sat, Aug 23, 2014 at 1:00 AM, JavaHokie 
> wrote:
>
>> Sure thing, here's my run method implementation:
>>
>> Configuration configuration = new Configuration();
>>
>> configuration.set("fs.defaultFS", "hdfs://127.0.0.1:8020");
>> configuration.set("mapreduce.job.tracker", "localhost:54311");
>> configuration.set("mapreduce.framework.name", "yarn");
>> configuration.set("yarn.resourcemanager.address",
>> "localhost:8032");
>>
>> Job job = Job.getInstance(configuration);
>>
>> /*
>>  * Set the basic stuff
>>  */
>> job.setJobName("TwitterJoin Query");
>> job.setJarByClass(TwitterJoin.class);
>>
>> /*
>>  * Set Mapper and Reducer Classes
>>  */
>> job.setMapperClass(TwitterJoinMapper.class);
>> job.setReducerClass(TwitterJoinReducer.class);
>>
>> /*
>>  * Set the Mapper MapOutputKeyClass and MapOutputValueClass
>>  */
>> job.setMapOutputKeyClass(Text.class);
>> job.setMapOutputValueClass(Text.class);
>>
>> /*
>>  * Set the Reducer OutputKeyClass and OutputValueClass
>>  */
>> job.setOutputKeyClass(Text.class);
>> job.setOutputValueClass(Mutation.class);
>>
>> /*
>>  * Set InputFormat and OutputFormat classes
>>  */
>>
>> job.setInputFormatClass(AccumuloMultiTableInputFormat.class);
>> job.setOutputFormatClass(AccumuloOutputFormat.class);
>>
>> /*
>>  * Configure InputFormat and OutputFormat Classes
>>  */
>> Map configs = new
>> HashMap();
>>
>> List ranges = Lists.newArrayList(new
>> Range("104587"),new
>> Range("105255"));
>>
>> InputTableConfig edgeConfig = new InputTableConfig();
>> edgeConfig.setRanges(ranges);
>> edgeConfig.setAutoAdjustRanges(true);
>>
>> InputTableConfig followerConfig = new InputTableConfig();
>> followerConfig.setRanges(ranges);
>> followerConfig.setAutoAdjustRanges(true);
>>
>> configs.put("following",followerConfig);
>> configs.put("twitteredges",edgeConfig);
>>
>>
>> AccumuloMultiTableInputFormat.setConnectorInfo(job,"root",new
>> PasswordToken("".getBytes()));
>>
>>
>> AccumuloMultiTableInputFormat.setZooKeeperInstance(job,"localhost","localhost");
>>
>> AccumuloMultiTableInputFormat.setScanAuthorizations(job,new
>> Authorizations("private"));
>> AccumuloMultiTableInputFormat.setInputTableConfigs(job,
>> configs);
>>
>>
>> AccumuloOutputFormat.setZooKeeperInstance(job,"localhost","localhost");
>> AccumuloOutputFormat.setConnectorInfo(job,"root",new
>> PasswordToken("".getBytes()));
>> AccumuloOutputFormat.setCreateTables(job,true);
>>
>> AccumuloOutputFormat.setDefaultTableName(job,"twitteredgerollup");
>>
>> /*
>>  * Kick off the job, wait for completion, and return
>> applicable code
>>  */
>> boolean success = job.waitForCompletion(true);
>>
>> if (success) {
>> return 0;
>> }
>>
>> return 1;
>> }
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-accumulo.1065345.n5.nabble.com/AccumuloMultiTableInputFormat-IllegalStateException-tp11186p11193.html
>> Sent from the Users mailing list archive at Nabble.com.
>>
>
>

Re: AccumuloMultiTableInputFormat IllegalStatementException

2014-08-22 Thread Corey Nolet

Job.getInstance(configuration) copies the configuration and makes its own.
Try doing your debug statement from earlier on job.getConfiguration() and
let's see what the base64 string looks like.



On Sat, Aug 23, 2014 at 1:00 AM, JavaHokie 
wrote:

> Sure thing, here's my run method implementation:
>
> Configuration configuration = new Configuration();
>
> configuration.set("fs.defaultFS", "hdfs://127.0.0.1:8020");
> configuration.set("mapreduce.job.tracker", "localhost:54311");
> configuration.set("mapreduce.framework.name", "yarn");
> configuration.set("yarn.resourcemanager.address",
> "localhost:8032");
>
> Job job = Job.getInstance(configuration);
>
> /*
>  * Set the basic stuff
>  */
> job.setJobName("TwitterJoin Query");
> job.setJarByClass(TwitterJoin.class);
>
> /*
>  * Set Mapper and Reducer Classes
>  */
> job.setMapperClass(TwitterJoinMapper.class);
> job.setReducerClass(TwitterJoinReducer.class);
>
> /*
>  * Set the Mapper MapOutputKeyClass and MapOutputValueClass
>  */
> job.setMapOutputKeyClass(Text.class);
> job.setMapOutputValueClass(Text.class);
>
> /*
>  * Set the Reducer OutputKeyClass and OutputValueClass
>  */
> job.setOutputKeyClass(Text.class);
> job.setOutputValueClass(Mutation.class);
>
> /*
>  * Set InputFormat and OutputFormat classes
>  */
>
> job.setInputFormatClass(AccumuloMultiTableInputFormat.class);
> job.setOutputFormatClass(AccumuloOutputFormat.class);
>
> /*
>  * Configure InputFormat and OutputFormat Classes
>  */
> Map configs = new
> HashMap();
>
> List ranges = Lists.newArrayList(new
> Range("104587"),new
> Range("105255"));
>
> InputTableConfig edgeConfig = new InputTableConfig();
> edgeConfig.setRanges(ranges);
> edgeConfig.setAutoAdjustRanges(true);
>
> InputTableConfig followerConfig = new InputTableConfig();
> followerConfig.setRanges(ranges);
> followerConfig.setAutoAdjustRanges(true);
>
> configs.put("following",followerConfig);
> configs.put("twitteredges",edgeConfig);
>
>
> AccumuloMultiTableInputFormat.setConnectorInfo(job,"root",new
> PasswordToken("".getBytes()));
>
>
> AccumuloMultiTableInputFormat.setZooKeeperInstance(job,"localhost","localhost");
> AccumuloMultiTableInputFormat.setScanAuthorizations(job,new
> Authorizations("private"));
> AccumuloMultiTableInputFormat.setInputTableConfigs(job,
> configs);
>
>
> AccumuloOutputFormat.setZooKeeperInstance(job,"localhost","localhost");
> AccumuloOutputFormat.setConnectorInfo(job,"root",new
> PasswordToken("".getBytes()));
> AccumuloOutputFormat.setCreateTables(job,true);
>
> AccumuloOutputFormat.setDefaultTableName(job,"twitteredgerollup");
>
> /*
>  * Kick off the job, wait for completion, and return
> applicable code
>  */
> boolean success = job.waitForCompletion(true);
>
> if (success) {
> return 0;
> }
>
> return 1;
> }
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/AccumuloMultiTableInputFormat-IllegalStateException-tp11186p11193.html
> Sent from the Users mailing list archive at Nabble.com.
>

Re: I am getting this connection loss error repeatedly from zookeeper. can you please suggest me?

2014-08-22 Thread Corey Nolet

Are you able to verify that a connection too Zookeeper ever made? Is Storm
running for you at all?


On Sat, Aug 23, 2014 at 12:39 AM, Georgy Abraham 
wrote:

> Storm is trying to run a zookeeper at your machine in 10.61.251.98:2000 .
> The code is not able to connect , maybe that port is already used by
> someone else ? Or Zookeeper cannot start ? Or zookeeper is not started ..
> Investigate on this lines and hope it solves your problem.
>
>
> On Thu, Aug 21, 2014 at 4:31 PM, M.Tarkeshwar Rao 
> wrote:
>
>> Hi all ,
>>
>>
>>
>> I am getting this error repeatedly. can you please suggest me?
>>
>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> KeeperErrorCode = ConnectionLoss
>>
>> at
>> com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:72)
>> ~[curator-client-1.0.1.jar:na]
>>
>> at
>> com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:74)
>> [curator-client-1.0.1.jar:na]
>>
>> at
>> com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:353)
>> [curator-framework-1.0.1.jar:na]
>>
>> at
>> com.netflix.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:149)
>> [curator-framework-1.0.1.jar:na]
>>
>> at
>> com.netflix.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:138)
>> [curator-framework-1.0.1.jar:na]
>>
>> at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85)
>> [curator-client-1.0.1.jar:na]
>>
>> at
>> com.netflix.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:134)
>> [curator-framework-1.0.1.jar:na]
>>
>> at
>> com.netflix.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:125)
>> [curator-framework-1.0.1.jar:na]
>>
>> at
>> com.netflix.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:34)
>> [curator-framework-1.0.1.jar:na]
>>
>> at
>> com.ericsson.mm.storm.zookeeper.service.TransactionErrorHandlerZK.init(TransactionErrorHandlerZK.java:67)
>> [mm-storm.jar:na]
>>
>> at
>> com.ericsson.mm.storm.zookeeper.service.TransactionErrorHandlerZK.connect(TransactionErrorHandlerZK.java:51)
>> [mm-storm.jar:na]
>>
>> at
>> com.ericsson.mm.storm.zookeeper.service.ZookeeperAdapter.connect(ZookeeperAdapter.java:89)
>> [mm-storm.jar:na]
>>
>> at
>> com.ericsson.mm.storm.utils.ZookeeperFacade.createConnectionToZk(ZookeeperFacade.java:42)
>> [mm-storm.jar:na]
>>
>> at
>> com.ericsson.mm.storm.transactional.bolt.impl.TransactionalBatchDataCreator.prepare(TransactionalBatchDataCreator.java:111)
>> [mm-storm.jar:na]
>>
>> at
>> backtype.storm.topology.BasicBoltExecutor.prepare(BasicBoltExecutor.java:26)
>> [storm-core-0.9.0.1.jar:na]
>>
>> at
>> backtype.storm.daemon.executor$fn__3498$fn__3510.invoke(executor.clj:674)
>> [storm-core-0.9.0.1.jar:na]
>>
>> at backtype.storm.util$async_loop$fn__444.invoke(util.clj:401)
>> [storm-core-0.9.0.1.jar:na]
>>
>> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
>>
>> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
>>
>> 2014-08-21 18:13:06 o.a.z.ClientCnxn [INFO] Opening socket connection to
>> server bl460cx2378/10.61.251.98:2000
>>
>> 2014-08-21 18:13:06 o.a.z.ClientCnxn [WARN] Session 0x0 for server null,
>> unexpected error, closing socket connection and attempting reconnect
>>
>> java.net.ConnectException: Connection refused
>>
>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>> ~[na:1.7.0_45]
>>
>> at
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
>> ~[na:1.7.0_45]
>>
>> at
>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
>> ~[zookeeper-3.3.3.jar:3.3.3-1073969]
>>
>>
>> Regards
>>
>> Tarkeshwar
>>
>
>

Re: AccumuloMultiTableInputFormat IllegalStatementException

2014-08-22 Thread Corey Nolet

The tests I'm running aren't using the native Hadoop libs either. If you
don't mind,  a little more code as to how you are setting up your job would
be useful. That's weird the key in the config would be null. Are you using
the job.getConfiguration()?

On Sat, Aug 23, 2014 at 12:31 AM, JavaHokie 
wrote:

> Hey Corey,
>
> Gotcha, i get a null when I attempt to log the value:
>
> log.debug(configuration.get("AccumuloInputFormat.ScanOpts.TableConfigs"));
>
> --John
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/AccumuloMultiTableInputFormat-IllegalStateException-tp11186p11191.html
> Sent from the Users mailing list archive at Nabble.com.
>

Re: AccumuloMultiTableInputFormat IllegalStatementException

2014-08-22 Thread Corey Nolet

The table configs get serialized as base64 and placed in the job's
Configuration under the key "AccumuloInputFormat.ScanOpts.TableConfigs".
Could you verify/print what's being placed in this key in your
configuration?



On Sat, Aug 23, 2014 at 12:15 AM, JavaHokie 
wrote:

> Hey Corey,
>
> Sure thing!  Here is my code:
>
> Map configs = new
> HashMap();
>
> List ranges = Lists.newArrayList(new
> Range("104587"),new
> Range("105255"));
>
> InputTableConfig edgeConfig = new InputTableConfig();
> edgeConfig.setRanges(ranges);
>
> InputTableConfig followerConfig = new InputTableConfig();
> followerConfig.setRanges(ranges);
>
> configs.put("following",followerConfig);
> configs.put("twitteredges",edgeConfig);
>
> These are the row values I am using to join entries from the following and
> twitteredges tables.
>
> --John
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/AccumuloMultiTableInputFormat-IllegalStateException-tp11186p11189.html
> Sent from the Users mailing list archive at Nabble.com.
>

Re: AccumuloMultiTableInputFormat IllegalStatementException

2014-08-22 Thread Corey Nolet

Hey John,

Could you give an example of one of the ranges you are using which causes
this to happen?


On Fri, Aug 22, 2014 at 11:02 PM, John Yost 
wrote:

> Hey Everyone,
>
> The AccumuloMultiTableInputFormat is an awesome addition to the Accumulo
> API and I am really excited to start using it.
>
> My first attempt with the 1.6.0 release resulted in this
> IllegalStateException:
>
> java.lang.IllegalStateException: The table query configurations could not
> be deserialized from the given configuration
>
> at
> org.apache.accumulo.core.client.mapreduce.lib.impl.InputConfigurator.getInputTableConfigs(InputConfigurator.java:566)
>
> at
> org.apache.accumulo.core.client.mapreduce.lib.impl.InputConfigurator.validateOptions(InputConfigurator.java:628)
>
> at
> org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.validateOptions(AbstractInputFormat.java:342)
>
> at
> org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.getSplits(AbstractInputFormat.java:537)
>
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:508)
>
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:392)
>
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
>
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:415)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
>
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1286)
>
> at
> com.johnyostanalytics.mapreduce.client.TwitterJoin.run(TwitterJoin.java:104)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>
> when I attempt to initialize the AccumuloMultiTableInputFormat:
>
> InputTableConfig baseConfig = new InputTableConfig();
> baseConfig.setRanges(ranges);
>
> InputTableConfig edgeConfig = new InputTableConfig();
> edgeConfig.setRanges(ranges);
> configs.put("base", baseConfig);
> configs.put("edges",edgeConfig);
>
> AccumuloMultiTableInputFormat.setInputTableConfigs(job, configs);
>
> Any ideas as to what may be going on?  I know that the table names are
> valid and that the Range objects are valid because I tested all of that
> independently via Accumulo scans.
>
> Any guidance is greatly appreciated because, again,
> AcumuloMultiTableInputFormat is really cool and I am really looking forward
> to using it.
>
> Thanks
>
> --John
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: Tablet server thrift issue

2014-08-22 Thread Corey Nolet

Josh,

Your advice is definitely useful- I also thought about catching the
exception and retrying with a fresh batch writer but the fact that the
batch writer failure doesn't go away without being re-instantiated is
really only a nuisance. The TabletServerBatchWriter could be designed much
better, I agree, but that is not the root of the problem.

The Thrift exception that is causing the issue is what I'd like to get to
the bottom of. It's throwing the following:

*TApplicationException: applyUpdates failed: out of sequence response *

I've never seen this exception before in regular use of the client API- but
I also just updated to 1.6.0. Google isn't showing anything useful for how
exactly this exception could come about other than using a bad threading
model- and I don't see any drastic changes or other user complaints on the
mailing list that would validate that line of thought. Quite frankly, I'm
stumped. This could be a Thrift exception related to a Thrift bug or
something bad on my system and have nothing to do with Accumulo.

Chris Tubbs mentioned to me earlier that he recalled Keith and Eric had
seen the exception before and may remember what it was/how they fixed it.


On Fri, Aug 22, 2014 at 10:58 PM, Josh Elser  wrote:

> Don't mean to tell you that I don't think there might be a bug/otherwise,
> that's pretty much just the limit of what I know about the server-side
> sessions :)
>
> If you have concrete "this worked in 1.4.4" and "this happens instead with
> 1.6.0", that'd make a great ticket :D
>
> The BatchWriter failure case is pretty rough, actually. Eric has made some
> changes to help already (in 1.6.1, I think), but it needs an overhaul that
> I haven't been able to make time to fix properly, either. IIRC, the only
> guarantee you have is that all mutations added before the last flush()
> happened are durable on the server. Anything else is a guess. I don't know
> the specifics, but that should be enough to work with (and saving off
> mutations shouldn't be too costly since they're stored serialized).
>
>
> On 8/22/14, 5:44 PM, Corey Nolet wrote:
>
>> Thanks Josh,
>>
>> I understand about the session ID completely but the problem I have is
>> that
>> the exact same client code worked, line for line, just fine in 1.4.4 and
>> it's acting up in 1.6.0. I also seem to remember the BatchWriter
>> automatically creating a new session when one expired without an exception
>> causing it to fail on the client.
>>
>> I know we've made changes since 1.4.4 but I'd like to troubleshoot the
>> actual issue of the BatchWriter failing due to the thrift exception rather
>> than just catching the exception and trying mutations again. The other
>> issue is that I've already submitted a bunch of mutations to the batch
>> writer from different threads. Does that mean I need to be storing them
>> off
>> twice? (once in the BatchWriter's cache and once in my own)
>>
>> The BatchWriter in my ingester is constantly sending data and the tablet
>> servers have been given more than enough memory to be able to keep up.
>> There's no swap being used and the network isn't experiencing any errors.
>>
>>
>> On Fri, Aug 22, 2014 at 4:54 PM, Josh Elser  wrote:
>>
>>  If you get an error from a BatchWriter, you pretty much have to throw
>>> away
>>> that instance of the BatchWriter and make a new one. See ACCUMULO-2990.
>>> If
>>> you want, you should be able to catch/recover from this without having to
>>> restart the ingester.
>>>
>>> If the session ID is invalid, my guess is that it hasn't been used
>>> recently and the tserver cleaned it up. The exception logic isn't the
>>> greatest (as it just is presented to you as a RTE).
>>>
>>> https://issues.apache.org/jira/browse/ACCUMULO-2990
>>>
>>>
>>> On 8/22/14, 4:35 PM, Corey Nolet wrote:
>>>
>>>  Eric & Keith, Chris mentioned to me that you guys have seen this issue
>>>> before. Any ideas from anyone else are much appreciated as well.
>>>>
>>>> I recently updated a project's dependencies to Accumulo 1.6.0 built with
>>>> Hadoop 2.3.0. I've got CDH 5.0.2 deployed. The project has an ingest
>>>> component which is running all the time with a batch writer using many
>>>> threads to push mutations into Accumulo.
>>>>
>>>> The issue I'm having is a show stopper. At different intervals of time,
>>>> sometimes an hour, sometimes 30 minutes, I'm getting
>>

Re: Tablet server thrift issue

2014-08-22 Thread Corey Nolet

Thanks Josh,

I understand about the session ID completely but the problem I have is that
the exact same client code worked, line for line, just fine in 1.4.4 and
it's acting up in 1.6.0. I also seem to remember the BatchWriter
automatically creating a new session when one expired without an exception
causing it to fail on the client.

I know we've made changes since 1.4.4 but I'd like to troubleshoot the
actual issue of the BatchWriter failing due to the thrift exception rather
than just catching the exception and trying mutations again. The other
issue is that I've already submitted a bunch of mutations to the batch
writer from different threads. Does that mean I need to be storing them off
twice? (once in the BatchWriter's cache and once in my own)

The BatchWriter in my ingester is constantly sending data and the tablet
servers have been given more than enough memory to be able to keep up.
There's no swap being used and the network isn't experiencing any errors.

On Fri, Aug 22, 2014 at 4:54 PM, Josh Elser  wrote:

> If you get an error from a BatchWriter, you pretty much have to throw away
> that instance of the BatchWriter and make a new one. See ACCUMULO-2990. If
> you want, you should be able to catch/recover from this without having to
> restart the ingester.
>
> If the session ID is invalid, my guess is that it hasn't been used
> recently and the tserver cleaned it up. The exception logic isn't the
> greatest (as it just is presented to you as a RTE).
>
> https://issues.apache.org/jira/browse/ACCUMULO-2990
>
>
> On 8/22/14, 4:35 PM, Corey Nolet wrote:
>
>> Eric & Keith, Chris mentioned to me that you guys have seen this issue
>> before. Any ideas from anyone else are much appreciated as well.
>>
>> I recently updated a project's dependencies to Accumulo 1.6.0 built with
>> Hadoop 2.3.0. I've got CDH 5.0.2 deployed. The project has an ingest
>> component which is running all the time with a batch writer using many
>> threads to push mutations into Accumulo.
>>
>> The issue I'm having is a show stopper. At different intervals of time,
>> sometimes an hour, sometimes 30 minutes, I'm getting
>> MutationsRejectedExceptions (server errors) from the
>> TabletServerBatchWriter. Once they start, I need to restart the ingester
>> to
>> get them to stop. They always come back within 30 minutes to an hour...
>> rinse, repeat.
>>
>> The exception always happens on different tablet servers. It's a thrift
>> error saying a message was received out of sequence. In the TabletServer
>> logs, I see an "Invalid session id" exception which happens only once
>> before the client-side batch writer starts spitting out the MREs.
>>
>> I'm running some heavyweight processing in Storm along side the tablet
>> servers. I shut that processing off in hopes that maybe it was the culprit
>> but that hasn't fixed the issue.
>>
>> I'm surprised I haven't seen any other posts on the topic.
>>
>> Thanks!
>>
>>

Tablet server thrift issue

2014-08-22 Thread Corey Nolet

Eric & Keith, Chris mentioned to me that you guys have seen this issue
before. Any ideas from anyone else are much appreciated as well.

I recently updated a project's dependencies to Accumulo 1.6.0 built with
Hadoop 2.3.0. I've got CDH 5.0.2 deployed. The project has an ingest
component which is running all the time with a batch writer using many
threads to push mutations into Accumulo.

The issue I'm having is a show stopper. At different intervals of time,
sometimes an hour, sometimes 30 minutes, I'm getting
MutationsRejectedExceptions (server errors) from the
TabletServerBatchWriter. Once they start, I need to restart the ingester to
get them to stop. They always come back within 30 minutes to an hour...
rinse, repeat.

The exception always happens on different tablet servers. It's a thrift
error saying a message was received out of sequence. In the TabletServer
logs, I see an "Invalid session id" exception which happens only once
before the client-side batch writer starts spitting out the MREs.

I'm running some heavyweight processing in Storm along side the tablet
servers. I shut that processing off in hopes that maybe it was the culprit
but that hasn't fixed the issue.

I'm surprised I haven't seen any other posts on the topic.

Thanks!

Re: Trident topology: sliding window aggregation?

2014-08-18 Thread Corey Nolet

I've been working on a project to use sliding windows more effectively in
Storm with a similar higher level builder like Trident.

https://github.com/calrissian/flowmix

Unfortunately, it does not directly translate to Trident but it could give
you an idea of what it would take to implement sliding/tumbling windows and
various different things you can do with them.

On Mon, Aug 18, 2014 at 3:15 AM, Krzysztof Zarzycki 
wrote:

> I bump the thread as I'm also looking for the answer to this question. If
> anyone has any ideas to share, that would be great!
>
> Thanks,
> - Zarzyk
>
>
> 2014-07-23 19:30 GMT+02:00 A.M. :
>
> While looking into storm Trident, how could we achieve sliding window
>> functions with Trident topology as demoed in Spark Streaking:
>>
>> *val tagCounts* = *hashTags*.*window*(*Minutes*(10), *Seconds*(*1*
>> )).countByValue()
>>
>> I've read about the regular storm topology using tick tuples to do the
>> sliding window. In trident, how could we do the same?
>>
>> Thanks.
>>
>> -Costco
>>
>>
>

Re: Kafka + Storm

2014-08-14 Thread Corey Nolet

Kafka is also distributed in nature, which is not something easily achieved
by queuing brokers like ActiveMQ or JMS (1.0) in general. Kafka allows data
to be partitioned across many machines which can grow as necessary as your
data grows.




On Thu, Aug 14, 2014 at 11:20 PM, Justin Workman 
wrote:

> Absolutely!
>
> Sent from my iPhone
>
> On Aug 14, 2014, at 9:02 PM, anand nalya  wrote:
>
> I agree, not for the long run but for small bursts in data production
> rate, say peak hours, Kafka can help in providing a somewhat consistent
> load on Storm cluster.
> --
> From: Justin Workman 
> Sent: ‎15-‎08-‎2014 07:53
> To: user@storm.incubator.apache.org
> Subject: Re: Kafka + Storm
>
> I suppose not directly.  It depends on the lifetime of your Kafka queues
> and on your latency requirements. You need to make sure you have enough
> "doctors" or in storm language workers, in your storm cluster to process
> your messages within your SLA.
>
> For our case we, we have a 3 hour lifetime or ttl configured for our
> queues. Meaning records in the queue older than 3 hours are purged. We also
> have an internal SLA ( team goal, not published to the business ;)) of 10
> seconds from event to end of stream and available for end user consumption.
>
> So we need to make sure we have enough storm workers to to meet; 1) the
> normal SLA and 2) be able to "catch up" on the queues when we have to take
> storm down for maintenance and such and the queues build.
>
> There are many knobs you can tune for both storm and Kafka. We have spent
> many hours tuning things to meet our SLAs.
>
> Justin
>
> Sent from my iPhone
>
> On Aug 14, 2014, at 8:05 PM, anand nalya  wrote:
>
> Also, since Kafka acts as a buffer, storm is not directly affected by the
> speed of your data sources/producers.
> --
> From: Justin Workman 
> Sent: ‎15-‎08-‎2014 07:12
> To: user@storm.incubator.apache.org
> Subject: Re: Kafka + Storm
>
> Good analogy!
>
> Sent from my iPhone
>
> On Aug 14, 2014, at 7:36 PM, "Adaryl \"Bob\" Wakefield, MBA" <
> adaryl.wakefi...@hotmail.com> wrote:
>
>  Ah so Storm is the hospital and Kafka is the waiting room where
> everybody queues up to be seen in turn yes?
>
> Adaryl "Bob" Wakefield, MBA
> Principal
> Mass Street Analytics
> 913.938.6685
> www.linkedin.com/in/bobwakefieldmba
> Twitter: @BobLovesData
>
>  *From:* Justin Workman 
> *Sent:* Thursday, August 14, 2014 7:47 PM
> *To:* user@storm.incubator.apache.org
> *Subject:* Re: Kafka + Storm
>
>  If you are familiar with Weblogic or ActiveMQ, it is similar. Let's see
> if I can explain, I am definitely not a subject matter expert on this.
>
> Within Kafka you can create "queues", ie a webclicks queue. Your web
> servers can then send click events to this queue in Kafka. The web servers,
> or agent writing the events to this queue are referred to as the
> "producer".  Each event, or message in Kafka is assigned an id.
>
> On the other side there are "consumers", in storms case this would be the
> storm Kafka spout, that can subscribe to this webclicks queue to consume
> the messages that are in the queue. The consumer can consume a single
> message from the queue, or a batch of messages, as storm does. The consumer
> keeps track of the latest offset, Kafka message id, that it has consumed.
> This way the next time the consumer checks to see if there are more
> messages to consume it will ask for messages with a message id greater than
> its last offset.
>
> This helps with the reliability of the event stream and helps guarantee
> that your events/message make it start to finish through your stream,
> assuming the events get to Kafka ;)
>
> Hope this helps and makes some sort of sense. Again, sent from my iPhone ;)
>
> Justin
>
> Sent from my iPhone
>
> On Aug 14, 2014, at 6:28 PM, "Adaryl \"Bob\" Wakefield, MBA" <
> adaryl.wakefi...@hotmail.com> wrote:
>
>   I get your reasoning at a high level. I should have specified that I
> wasn’t sure what Kafka does. I don’t have a hard software engineering
> background. I know that Kafka is “a message queuing” system, but I don’t
> really know what that means.
>
> (I can’t believe you wrote all that from your iPhone)
> B.
>
>
>  *From:* Justin Workman 
> *Sent:* Thursday, August 14, 2014 7:22 PM
> *To:* user@storm.incubator.apache.org
> *Subject:* Re: Kafka + Storm
>
>  Personally, we looked at several options, including writing our own
> storm source. There are limited storm sources with community support out
> there. For us, it boiled down to the following;
>
> 1) community support and what appeared to be a standard method. Storm has
> now included the kafka source as a bundled component to storm. This made
> the implementation much faster, because the code was done.
> 2) the durability (replication and clustering) of Kafka. We have a three
> hour retention period on our queues, so if we need to do maintenance on
> storm or deploy an updated topology, we don't need to s

Re: Good way to test when topology in local cluster is "fully active"

2014-08-05 Thread Corey Nolet

This did work. Thanks!


On Tue, Aug 5, 2014 at 2:23 PM, P. Taylor Goetz  wrote:

> My guess is that the slowdown you are seeing is a result of the new
> version of ZooKeeper and how it handles IPv4/6.
>
> Try adding the following JVM parameter when running your tests:
>
> -Djava.net.preferIPv4Stack=true
>
> -Taylor
>
> On Aug 4, 2014, at 8:49 PM, Corey Nolet  wrote:
>
> > I'm testing some sliding window algorithms with tuples emitted from a
> mock spout based on a timer but the amount of time it takes the topology to
> fully start up and activate seems to vary from computer to computer.
> Specifically, I just updated from 0.8.2 to 0.9.2-incubating and all of my
> tests are breaking because the time to activate the topology is taking
> longer (because of Netty possibly?). I'd like to make my tests more
> resilient to things like this.
> >
> > Is there something I can look at in LocalCluster where I could do
> "while(!notActive) { Thread.sleep(50) }" ?
> >
> > This is what my test looks like currently:
> >
> >   StormTopology topology = buildTopology(...);
> >   Config conf = new Config();
> >   conf.setNumWorkers(1);
> >
> >   LocalCluster cluster = new LocalCluster();
> >   cluster.submitTopology(getTopologyName(), conf, topology);
> >
> >   try {
> > Thread.sleep(4000);
> >   } catch (InterruptedException e) {
> > e.printStackTrace();
> >   }
> >
> >   cluster.shutdown();
> >
> >   assertEquals(4, MockSinkBolt.getEvents().size());
> >
> >
> >
> > Thanks!
> >
> >
>
>

Re: Good way to test when topology in local cluster is "fully active"

2014-08-05 Thread Corey Nolet

Sorry- the ipv4 fix worked.


On Tue, Aug 5, 2014 at 9:13 PM, Corey Nolet  wrote:

> This did work. Thanks!
>
>
> On Tue, Aug 5, 2014 at 2:23 PM, P. Taylor Goetz  wrote:
>
>> My guess is that the slowdown you are seeing is a result of the new
>> version of ZooKeeper and how it handles IPv4/6.
>>
>> Try adding the following JVM parameter when running your tests:
>>
>> -Djava.net.preferIPv4Stack=true
>>
>> -Taylor
>>
>> On Aug 4, 2014, at 8:49 PM, Corey Nolet  wrote:
>>
>> > I'm testing some sliding window algorithms with tuples emitted from a
>> mock spout based on a timer but the amount of time it takes the topology to
>> fully start up and activate seems to vary from computer to computer.
>> Specifically, I just updated from 0.8.2 to 0.9.2-incubating and all of my
>> tests are breaking because the time to activate the topology is taking
>> longer (because of Netty possibly?). I'd like to make my tests more
>> resilient to things like this.
>> >
>> > Is there something I can look at in LocalCluster where I could do
>> "while(!notActive) { Thread.sleep(50) }" ?
>> >
>> > This is what my test looks like currently:
>> >
>> >   StormTopology topology = buildTopology(...);
>> >   Config conf = new Config();
>> >   conf.setNumWorkers(1);
>> >
>> >   LocalCluster cluster = new LocalCluster();
>> >   cluster.submitTopology(getTopologyName(), conf, topology);
>> >
>> >   try {
>> > Thread.sleep(4000);
>> >   } catch (InterruptedException e) {
>> > e.printStackTrace();
>> >   }
>> >
>> >   cluster.shutdown();
>> >
>> >   assertEquals(4, MockSinkBolt.getEvents().size());
>> >
>> >
>> >
>> > Thanks!
>> >
>> >
>>
>>
>

Re: Good way to test when topology in local cluster is "fully active"

2014-08-05 Thread Corey Nolet

Vincent & P.Taylor,

I played with the testing framework for a little bit last night and don't
see any easy way to provide pauses in between the emissions of the mock
tuples. For instance, my sliding window semantics are heavily orchestrated
by time evictions and triggers which mean that I need to be able to time
the tuples being fed into the tests (i.e. emit a tuple every 500ms and run
the test for 25 secons).


On Tue, Aug 5, 2014 at 2:23 PM, P. Taylor Goetz  wrote:

> My guess is that the slowdown you are seeing is a result of the new
> version of ZooKeeper and how it handles IPv4/6.
>
> Try adding the following JVM parameter when running your tests:
>
> -Djava.net.preferIPv4Stack=true
>
> -Taylor
>
> On Aug 4, 2014, at 8:49 PM, Corey Nolet  wrote:
>
> > I'm testing some sliding window algorithms with tuples emitted from a
> mock spout based on a timer but the amount of time it takes the topology to
> fully start up and activate seems to vary from computer to computer.
> Specifically, I just updated from 0.8.2 to 0.9.2-incubating and all of my
> tests are breaking because the time to activate the topology is taking
> longer (because of Netty possibly?). I'd like to make my tests more
> resilient to things like this.
> >
> > Is there something I can look at in LocalCluster where I could do
> "while(!notActive) { Thread.sleep(50) }" ?
> >
> > This is what my test looks like currently:
> >
> >   StormTopology topology = buildTopology(...);
> >   Config conf = new Config();
> >   conf.setNumWorkers(1);
> >
> >   LocalCluster cluster = new LocalCluster();
> >   cluster.submitTopology(getTopologyName(), conf, topology);
> >
> >   try {
> > Thread.sleep(4000);
> >   } catch (InterruptedException e) {
> > e.printStackTrace();
> >   }
> >
> >   cluster.shutdown();
> >
> >   assertEquals(4, MockSinkBolt.getEvents().size());
> >
> >
> >
> > Thanks!
> >
> >
>
>

Re: Good way to test when topology in local cluster is "fully active"

2014-08-04 Thread Corey Nolet

Nevermind, I wrote that before looking. This has been around since 0.8.1.
Thanks again Vincent!


On Mon, Aug 4, 2014 at 11:01 PM, Corey Nolet  wrote:

> Oh Nice. Is this new in 0.9.*? I just updated so I haven't looked much
> into what's changed yet, other than Netty.
>
>
> On Mon, Aug 4, 2014 at 10:40 PM, Vincent Russell <
> vincent.russ...@gmail.com> wrote:
>
>> Corey,
>>
>> Have you tried using the integration testing framework that comes with
>> storm?
>>
>>
>> Testing.withSimulatedTimeLocalCluster(mkClusterParam,
>>  new TestJob() {
>> @Override
>> public void run(ILocalCluster cluster) throws Exception {
>>
>> CompleteTopologyParam completeTopologyParam = new CompleteTopologyParam();
>> completeTopologyParam
>> .setMockedSources(mockedSources);
>>  completeTopologyParam.setStormConf(daemonConf);
>>
>> completeTopologyParam.setTopologyName(getTopologyName());
>> Map result = Testing.completeTopology(cluster,
>>  topology, completeTopologyParam);
>>
>> });
>>
>> -Vincent
>>
>>  On Mon, Aug 4, 2014 at 8:49 PM, Corey Nolet  wrote:
>>
>>> I'm testing some sliding window algorithms with tuples emitted from a
>>> mock spout based on a timer but the amount of time it takes the topology to
>>> fully start up and activate seems to vary from computer to computer.
>>> Specifically, I just updated from 0.8.2 to 0.9.2-incubating and all of my
>>> tests are breaking because the time to activate the topology is taking
>>> longer (because of Netty possibly?). I'd like to make my tests more
>>> resilient to things like this.
>>>
>>> Is there something I can look at in LocalCluster where I could do
>>> "while(!notActive) { Thread.sleep(50) }" ?
>>>
>>> This is what my test looks like currently:
>>>
>>>   StormTopology topology = buildTopology(...);
>>>   Config conf = new Config();
>>>   conf.setNumWorkers(1);
>>>
>>>   LocalCluster cluster = new LocalCluster();
>>>   cluster.submitTopology(getTopologyName(), conf, topology);
>>>
>>>   try {
>>> Thread.sleep(4000);
>>>   } catch (InterruptedException e) {
>>> e.printStackTrace();
>>>   }
>>>
>>>   cluster.shutdown();
>>>
>>>   assertEquals(4, MockSinkBolt.getEvents().size());
>>>
>>>
>>>
>>> Thanks!
>>>
>>>
>>>
>>
>

Re: Good way to test when topology in local cluster is "fully active"

2014-08-04 Thread Corey Nolet

Oh Nice. Is this new in 0.9.*? I just updated so I haven't looked much into
what's changed yet, other than Netty.


On Mon, Aug 4, 2014 at 10:40 PM, Vincent Russell 
wrote:

> Corey,
>
> Have you tried using the integration testing framework that comes with
> storm?
>
>
> Testing.withSimulatedTimeLocalCluster(mkClusterParam,
>  new TestJob() {
> @Override
> public void run(ILocalCluster cluster) throws Exception {
>
> CompleteTopologyParam completeTopologyParam = new CompleteTopologyParam();
> completeTopologyParam
> .setMockedSources(mockedSources);
>  completeTopologyParam.setStormConf(daemonConf);
>
> completeTopologyParam.setTopologyName(getTopologyName());
> Map result = Testing.completeTopology(cluster,
>  topology, completeTopologyParam);
>
> });
>
> -Vincent
>
> On Mon, Aug 4, 2014 at 8:49 PM, Corey Nolet  wrote:
>
>> I'm testing some sliding window algorithms with tuples emitted from a
>> mock spout based on a timer but the amount of time it takes the topology to
>> fully start up and activate seems to vary from computer to computer.
>> Specifically, I just updated from 0.8.2 to 0.9.2-incubating and all of my
>> tests are breaking because the time to activate the topology is taking
>> longer (because of Netty possibly?). I'd like to make my tests more
>> resilient to things like this.
>>
>> Is there something I can look at in LocalCluster where I could do
>> "while(!notActive) { Thread.sleep(50) }" ?
>>
>> This is what my test looks like currently:
>>
>>   StormTopology topology = buildTopology(...);
>>   Config conf = new Config();
>>   conf.setNumWorkers(1);
>>
>>   LocalCluster cluster = new LocalCluster();
>>   cluster.submitTopology(getTopologyName(), conf, topology);
>>
>>   try {
>> Thread.sleep(4000);
>>   } catch (InterruptedException e) {
>> e.printStackTrace();
>>   }
>>
>>   cluster.shutdown();
>>
>>   assertEquals(4, MockSinkBolt.getEvents().size());
>>
>>
>>
>> Thanks!
>>
>>
>>
>

Good way to test when topology in local cluster is "fully active"

2014-08-04 Thread Corey Nolet

I'm testing some sliding window algorithms with tuples emitted from a mock
spout based on a timer but the amount of time it takes the topology to
fully start up and activate seems to vary from computer to computer.
Specifically, I just updated from 0.8.2 to 0.9.2-incubating and all of my
tests are breaking because the time to activate the topology is taking
longer (because of Netty possibly?). I'd like to make my tests more
resilient to things like this.

Is there something I can look at in LocalCluster where I could do
"while(!notActive) { Thread.sleep(50) }" ?

This is what my test looks like currently:

  StormTopology topology = buildTopology(...);
  Config conf = new Config();
  conf.setNumWorkers(1);

  LocalCluster cluster = new LocalCluster();
  cluster.submitTopology(getTopologyName(), conf, topology);

  try {
Thread.sleep(4000);
  } catch (InterruptedException e) {
e.printStackTrace();
  }

  cluster.shutdown();

  assertEquals(4, MockSinkBolt.getEvents().size());



Thanks!

Re: Z-Curve/Hilbert Curve

2014-07-28 Thread Corey Nolet

On Calrissian, we had once considered making a lexicoder for lat/long that
really transformed the two-dimensions (lat/lon) down into a geohash based
on the z-curve.

The reason we decided against a first class data-type for this is exactly
the same reason that Anthony brings up in his previous comment- the geohash
only makes sense as a query tool when you have a good way to structure (at
least as many as possible) the non-sequential ranges you would need in
order to find all the overlapping regions (with minimal as possible amount
of false-positives).


On Mon, Jul 28, 2014 at 2:00 PM, Jared Winick  wrote:

> As several people have commented, a single range for a query can produce a
> lot of false positives that need to be filtered out. I had made this
> visualization a while back that lets you interactively (click-drag a
> bounding box) see this behavior.
>
> http://bl.ocks.org/jaredwinick/5073432
>
>
>
>
> On Sun, Jul 27, 2014 at 2:15 PM, Anthony Fox  wrote:
>
>> My first thought was just something simple for a first pass - lat/lon ->
>> a single lexicographic dimension -  as it would cover most basic use
>> cases.  Precision (number of bits in encoding) could be an arg or a config
>> variable.  For WITHIN/INTERSECTS topological predicates, we need to
>> decompose the query geometry into the (possibly/probably non-contiguous) 1D
>> ranges that cover the region in question.  GeoMesa has an algorithm to
>> quickly perform this decomposition that computes the minimum number of
>> geohash prefixes at different resolutions to fully cover the query
>> polygon.  Basically, it recursively traverses through levels of geohash
>> resolutions, prioritizing rectangles that intersect the region and
>> discarding non-intersecting rectangles at the lowest precisions, to produce
>> a sequence of ranges to scan over.  Fully contained rectangles are
>> discovered at their lowest resolution at which point the algorithm pops the
>> stack and searches the next prioritized prefix.  I think something like
>> this would definitely need to be ported over and included in a lexicoder
>> implementation to make it useful.  Also, rather than materialize the entire
>> set of ranges in memory, we can either return a lazy iterator of prefixes
>> that can be fed into a scanner in batches or we can have a short-circuit
>> config that tunes the amount of slop that's tolerable and cuts off the
>> traversal at a certain level of precision.  GeoMesa uses something like the
>> former on attribute indexes to coordinate parallel scanners on separate
>> index and record tables.
>>
>> Thoughts?  I'm inclined to keep the implementation to the bare minimum
>> necessary for the basic use cases (lat/lon and bbox queries) though I do
>> think a general dimensionality reducing lexicoder would be very useful.
>>
>>
>>
>> On Fri, Jul 25, 2014 at 6:51 PM, Chris Bennight 
>> wrote:
>>
>>> A couple of related issues come up when considering implementing a
>>> dimensionality reducing encoding -- just want to toss those out to see what
>>> people think the interface might look like.
>>>
>>> There's a couple of aspects that could be brought in here, but lets keep
>>> it simple and considering the original question: (lat/lon) -> number.
>>>
>>> --Desired precision of the binning process
>>> The more bits we add to the z-curve, the more precise our comparison -
>>> i.e. a 63 bit key would have more "locations" to sort by than a 24 bit key.
>>>
>>> Would you see a reasonable default getting picked, make this user
>>> configurable, or both? (i.e. default to a value, extended options with a
>>> new constructor?)
>>>
>>> --Semantics for turning two lat/long pairs into a range
>>> I'm extrapolating here, but the only reason I see that locality matters
>>> is if we want to preserve locality for range searches.  The internal
>>> implementation of the encoding/lexicoding process is going to directly
>>> impact the implementation of the range query.
>>> Now sure, someone could encode the lower left point, encode the upper
>>> right point, and construct a range out of that to pass for a scan, but
>>> that's going to be wildly inefficient in most cases. See:
>>> https://dl.dropboxusercontent.com/u/6649380/bbox.png
>>> If we just lexicode the lower left and upper right we traverse across
>>> the entire curve - hitting lots of areas that aren't actually in the
>>> original range.
>>> Now we can turn a single 2D range into a set of 1D ranges.   There is
>>> some potential tuning here now, as the algorithm has a tradeoff on time to
>>> compute the ranges (and number of ranges) vs.  "slop"  (or inclusion of
>>> ranges which aren't actually in the original query).
>>> Would you see a static method perhaps on the z-curve lexicoder that
>>> returns a series of ranges based on an input window?  Some other mechanism?
>>> And in the case of "slop" - would we just document that the ranges could
>>> actually include values not expected - or would we always fully decompose?
>>>
>>> --

InterruptedException in PathChildrenCache

2014-07-15 Thread Corey Nolet

I'm using a leaderSelector and path children cache together. During a
shutdown method, I'm getting an InterruptedException and i'm having trouble
figuring out a graceful way to handle it.

java.lang.InterruptedException

at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1038)

at
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)

at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)

at
org.apache.curator.CuratorZookeeperClient.internalBlockUntilConnectedOrTimedOut(CuratorZookeeperClient.java:324)

at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:105)

at
org.apache.curator.utils.EnsurePath$InitialHelper.ensure(EnsurePath.java:140)

at org.apache.curator.utils.EnsurePath.ensure(EnsurePath.java:99)

at
org.apache.curator.framework.recipes.cache.PathChildrenCache.refresh(PathChildrenCache.java:481)

at
org.apache.curator.framework.recipes.cache.RefreshOperation.invoke(RefreshOperation.java:35)

at
org.apache.curator.framework.recipes.cache.PathChildrenCache$10.run(PathChildrenCache.java:762)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

at java.util.concurrent.FutureTask.run(FutureTask.java:166)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

at java.util.concurrent.FutureTask.run(FutureTask.java:166)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:724)



Here's my shutdown code.

  if
(curatorFramework.getState().equals(CuratorFrameworkState.STARTED)) {

pathChildrenCache.close();

leaderSelector.close();

curatorFramework.close();

  }


I'm assuming the closing of PathChildrenCache and LeaderSelector are
probably asynchronous? Adding Thread.sleep(2000); does not seem to help,
though. Is there an excepted way of closing these and waiting?

Re: Retrieving Rows in Reverse (Descending) Order

2014-06-30 Thread Corey Nolet

Donald,

Thanks for the share! Awhile ago, I stumbled across leveldb and I also
pondered the ability to support reverse scanning as a first class feature
in Accumulo. The numbers on that ticket are not surprising, however- we'd
have to assume there'd be a trade-off to make. The trade-off between ingest
throughput and total space consumed vs higher query latency may very well
be worth it for many use cases.

+1 for filing a ticket.



On Mon, Jun 30, 2014 at 12:52 PM, Donald Miner 
wrote:

> Looks like it was easier said than done for HBase, but they did it:
> https://issues.apache.org/jira/browse/HBASE-4811
>
>
>
>
> On Mon, Jun 30, 2014 at 12:46 PM, Andrew Wells 
> wrote:
>
>> so, if you need to do both, ascending and descending order. I would need
>> to do 2 writes for each record going in... that might not be possible in
>> our situation.
>>
>>
>> On Mon, Jun 30, 2014 at 12:26 PM, Corey Nolet  wrote:
>>
>>> Andrew,
>>>
>>> Our recommendation on this has typically been to reverse the sort order
>>> of the keys on ingest.
>>>
>>>
>>> On Mon, Jun 30, 2014 at 12:24 PM, Andrew Wells 
>>> wrote:
>>>
>>>> Are there currently any good practices on doing this?
>>>>
>>>> Especially when a rowId has a large number of Keys.
>>>>
>>>> --
>>>> *Andrew George Wells*
>>>> *Software Engineer*
>>>> *awe...@clearedgeit.com *
>>>>
>>>>
>>>
>>
>>
>> --
>> *Andrew George Wells*
>> *Software Engineer*
>> *awe...@clearedgeit.com *
>>
>>
>
>
> --
>
> Donald Miner
> Chief Technology Officer
> ClearEdge IT Solutions, LLC
> Cell: 443 799 7807
> www.clearedgeit.com
>

Re: Retrieving Rows in Reverse (Descending) Order

2014-06-30 Thread Corey Nolet

Andrew,

Our recommendation on this has typically been to reverse the sort order of
the keys on ingest.

On Mon, Jun 30, 2014 at 12:24 PM, Andrew Wells 
wrote:

> Are there currently any good practices on doing this?
>
> Especially when a rowId has a large number of Keys.
>
> --
> *Andrew George Wells*
> *Software Engineer*
> *awe...@clearedgeit.com *
>
>

LeaderSelector throwing exception on mutex.acquire()

2014-06-20 Thread Corey Nolet

The exception below is happening quite frequently in my code. It looks like
it's because I'm trying to do operations with a client before it has fully
initiated a connection to Zookeeper. What is the recommended way to wait
until the curatorFramework is connected? Apply a connectionListener and
block until a state of CONNECTED is passed into the listener?


Thanks!

2014-06-20 22:10:08,335 [leader.LeaderSelector] ERROR: mutex.acquire()
threw an exception

java.lang.IllegalStateException: instance must be started before calling
this method

at com.google.common.base.Preconditions.checkState(Preconditions.java:149)

at
org.apache.curator.framework.imps.CuratorFrameworkImpl.delete(CuratorFrameworkImpl.java:358)

at
org.apache.curator.framework.recipes.locks.LockInternals.deleteOurPath(LockInternals.java:345)

at
org.apache.curator.framework.recipes.locks.LockInternals.internalLockLoop(LockInternals.java:335)

at
org.apache.curator.framework.recipes.locks.LockInternals.attemptLock(LockInternals.java:225)

at
org.apache.curator.framework.recipes.locks.InterProcessMutex.internalLock(InterProcessMutex.java:221)

at
org.apache.curator.framework.recipes.locks.InterProcessMutex.acquire(InterProcessMutex.java:77)

at
org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:385)

at
org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:443)

at
org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:63)

at
org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:244)

at
org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:238)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

at java.util.concurrent.FutureTask.run(FutureTask.java:166)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

at java.util.concurrent.FutureTask.run(FutureTask.java:166)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:724)

Why do the fluent methods on the CuratorFramework throw Exception instead of InterruptedException and KeeperException

2014-06-20 Thread Corey Nolet

Just curious. Was there a reason for that?

Re: Is Data Locality Helpful? (or why run tserver and datanode on the same box?)

2014-06-19 Thread Corey Nolet

AFAIK, the locality may not be guaranteed right away unless the data for a
tablet was first ingested on the tablet server that is responsible for that
tablet, otherwise you'll need to wait for a major compaction to rewrite the
RFiles locally on the tablet server. I would assume if the tablet server is
not on the same node as the datanode, those files will probably be spread
across the cluster as if you were ingesting data from outside the cloud.

A recent discussion with Bill Slacum also brought to light a possible
problem of the HDFS balancer [1] re-balancing blocks after the fact which
could eventually pull blocks onto datanodes that are not local to the
tablets. I believe remedy for this was to turn off the balancer or not have
it run.

[1]
http://www.swiss-scalability.com/2013/08/hadoop-hdfs-balancer-explained.html

On Thu, Jun 19, 2014 at 10:07 AM, David Medinets 
wrote:

> At the Accumulo Summit and on a recent client site, there have been
> conversations about Data Locality and Accumulo.
>
> I ran an experiment to see that Accumulo can scan tables when the
> tserver process is run on a server without a datanode process. I
> followed these steps:
>
> 1. Start three node cluster
> 2. Load data
> 3. Kill datanode on slave1
> 4. Wait until Hadoop notices dead node.
> 5. Kill tserver on slave2
> 6. Wait until Accumulo notices dead node.
> 7. Run the accumulo shell on master and slave1 to verify entries can be
> scanned.
>
> Accumulo handled this situation just fine. As I expected.
>
> How important (or not) is it to run tserver and datanode on the same
> server?
> Does the Data Locality implied by running them together exist?
> Can the benefit be quantified?
>

Time to release 1.6.1?

2014-06-19 Thread Corey Nolet

I'd like to start getting a candidate together if there are no objections.

It looks like we have 65 resolved tickets with a fix version of 1.6.1.

Re: Mini Accumulo Cluster Use Case: Development and Training

2014-06-13 Thread Corey Nolet

Wouldn't that take care of ACCUMULO-1378


On Fri, Jun 13, 2014 at 12:56 PM, Keith Turner  wrote:

> On Thu, Jun 12, 2014 at 11:44 PM, Vicky Kak  wrote:
>
> > Rather than having new development can't we add these features to the
> > existing accumulo command line, I think these makes life easier and are
> not
> > there in exsiting accumulo command line tool
> >
> > 1. Persist the entire state of mini accumulo
> >
>
> It would be nice if the default behavior of requiring an empty directory
> remained the same.  Some users may depend on this behaviour.  An option
> could be added to MiniAccumuloConfig that allows a user to enable directory
> reuse.
>
>
> > 4. Add a function to force re-initialization of the  MAC
> >
> >
> > I just scanned the code but would like to run it too. Instantly I can say
> > that the accumulo command line should be updated with these
> > functionalities.
> >
> > Thanks,
> > Vicky
> >
> >
> >
> >
> >
> > On Fri, Jun 13, 2014 at 3:40 AM, Andrew Wells 
> > wrote:
> >
> > > I developed this tool for doing persistent Mini Accumulo Cluster for
> > > training, and others have said it would be useful for doing
> Development.
> > >
> > > It does the following,
> > >
> > > Allows for optional persistence of the tables.
> > >
> > > Allows for shell access to MAC
> > >
> > > Here is the the tool on github as it stands, it is pretty down and
> dirty:
> > >
> > > https://github.com/agwells0714/AccumuloDeveloperUtil
> > >
> > >
> > > I would like to start contributing code to OBSOLETE this project.
> > >
> > > I imagine the following would satisfy this requirement.
> > >
> > > 1. Persist the entire state of mini accumulo
> > >
> > > 2. allow shell access to MAC
> > >
> > > 3. allow option to also start a monitor (for additional testing)
> > >
> > > 4. Add a function to force re-initialization of the  MAC
> > >
> > >
> > > Thoughts? Suggestions?
> > >
> > > --
> > > *Andrew George Wells*
> > > *Software Engineer*
> > > *awe...@clearedgeit.com *
> > >
> >
>

Re: [VOTE] Storm Logo Contest - Final Round

2014-06-10 Thread Corey Nolet

6 - 1pt
9 - 4pt


On Tue, Jun 10, 2014 at 8:34 PM, Nathan Leung  wrote:

> #10 - 5
>
>
> On Tue, Jun 10, 2014 at 5:56 PM, Osman  wrote:
>
>> 9 5
>> On Jun 9, 2014 7:39 PM, "P. Taylor Goetz"  wrote:
>>
>>> This is a call to vote on selecting the winning Storm logo from the 3
>>> finalists.
>>>
>>> The three candidates are:
>>>
>>>  * [No. 6 - Alec Bartos](
>>> http://storm.incubator.apache.org/2014/04/23/logo-abartos.html)
>>>  * [No. 9 - Jennifer Lee](
>>> http://storm.incubator.apache.org/2014/04/29/logo-jlee1.html)
>>>  * [No. 10 - Jennifer Lee](
>>> http://storm.incubator.apache.org/2014/04/29/logo-jlee2.html)
>>>
>>> VOTING
>>>
>>> Each person can cast a single vote. A vote consists of 5 points that can
>>> be divided among multiple entries. To vote, list the entry number, followed
>>> by the number of points assigned. For example:
>>>
>>> #1 - 2 pts.
>>> #2 - 1 pt.
>>> #3 - 2 pts.
>>>
>>> Votes cast by PPMC members are considered binding, but voting is open to
>>> anyone. In the event of a tie vote from the PPMC, votes from the community
>>> will be used to break the tie.
>>>
>>> This vote will be open until Monday, June 16 11:59 PM UTC.
>>>
>>> - Taylor
>>>
>>
>

Re: Accumulo Summit Hackathon

2014-06-09 Thread Corey Nolet

+1 on the Ganglia integration.

Also, while we're on the topic of github projects for integrating with
Accumulo, I'd like to see Accuismus worked on as well.

https://github.com/keith-turner/Accismus


On Mon, Jun 9, 2014 at 7:11 PM, Alex Moundalexis 
wrote:

> I would love to see metrics2 from Hadoop Common implemented. Easier Ganglia
> integration without the use of JMX and conversion utilities.
>
> https://issues.apache.org/jira/browse/ACCUMULO-1817
>
> I knew I remembered an issue being open. :)
>
>
> On Sun, Jun 8, 2014 at 8:38 PM, Tamburello, Paul [USA] <
> tamburello_p...@bah.com> wrote:
>
> > Accumulo Development Team -
> >
> > As many of you already know, this week during the Accumulo Summit<
> > http://accumulosummit.com/>, Booz Allen Hamilton is sponsoring a
> > Hackathon event following
> > the conference, from 5PM til 11PM.  As part of the agenda, we are working
> > to put together a list of existing JIRA tickets that folks can work on
> > during the Hackathon, so we want to solicit input from the Accumulo
> > contributors. So, if you have suggestions for tasks that people can work
> > on, please respond directly to me and we will add them to our list.
> >
> > Thanks in advance, see you all on Thursday!
> > Paul
> >
> > Paul Tamburello
> > Senior Lead Engineer
> > Strategic Innovation Group
> > 301-821-8861 / 919-260-6158
> > tamburello_p...@bah.com
> >
>

Re: Force leader

2014-05-28 Thread Corey Nolet

Jordan,

I'm still wrestling with the recommended way to implement this. Basically
what I want is a normal leader selection algorithm but with the ability to
force a leader sometimes.

Ultimately, it would be amazing if I could do this:


leaderSelector.forceLead("id");

I've tried having a node in Zookeeper that the leader is watching so that
it can revoke leadership upon the node being updated. Upon each consecutive
node being elected as a leader, they check to see if they are the forced
lead and if not, they revoke right away. This definitely seems like a hack.
I'd like to try to stay in the Curator framework to implement this if
possible.

Thanks!




On Wed, May 21, 2014 at 1:34 PM, Corey Nolet  wrote:

> It it helps, here's a rough design of the project:
>
> https://github.com/calrissian/conductor
>
>
> On Wed, May 21, 2014 at 1:33 PM, Corey Nolet  wrote:
>
>> Jordan,
>>
>> Thanks for your quick response! So what I am building is a faul-tolerant
>> framework for linux systems to watch over some number of processes and,
>> when a process goes down and can't be brought back up (disk space, memory,
>> etc...), I want the process watcher to revoke its leadership so that a
>> different machine can start up the process, do any configuration necessary
>> to route clients over to that machine, and assume responsibility as the
>> lead for that process.
>>
>> Often times what happens is a couple physical machines go down and ALL
>> the managed processes end up on a single machine.
>>
>> I'm giving the users control over groups of processes. A group would be
>> "amqp broker" or "web server". Once a sys admin comes into work and
>> realizes what happened, they are going to want to force the amqp broker
>> back to a specific node and force the web server onto some other node. The
>> way I've designed it, this means those nodes need to be forced to be the
>> leaders for those groups.
>>
>> Thanks again!
>>
>>
>> On Wed, May 21, 2014 at 1:20 PM, Jordan Zimmerman <
>> jor...@jordanzimmerman.com> wrote:
>>
>>> I’d need more details to answer concretely. But, this sounds like a
>>> simple lock. Have the process that wants to be leader acquire an
>>> InterProcessMutex.
>>>
>>> -JZ
>>>
>>>
>>> From: Corey Nolet cjno...@gmail.com
>>> Reply: user@curator.apache.org user@curator.apache.org
>>> Date: May 21, 2014 at 12:12:35 PM
>>> To: user user@curator.apache.org
>>> Subject:  Force leader
>>>
>>>  I have a cluster which is electing a single leader to perform
>>> operations on a node until the node is deemed to be unhealthy. At this
>>> time, the leader revokes itself and another leader is elected to perform
>>> the operations.
>>>
>>> There are times, however, when I need the ability to force a specific
>>> leader. How would I implement something like this? I really don't want to
>>> have to cascade through all the other nodes and tell them to revoke their
>>> leadership because they will each try to run some initialization upon
>>> becoming the leader and that would waste resources.
>>>
>>> Any ideas?
>>>
>>> Thanks!
>>>
>>>
>>
>

Re: Force leader

2014-05-21 Thread Corey Nolet

It it helps, here's a rough design of the project:

https://github.com/calrissian/conductor


On Wed, May 21, 2014 at 1:33 PM, Corey Nolet  wrote:

> Jordan,
>
> Thanks for your quick response! So what I am building is a faul-tolerant
> framework for linux systems to watch over some number of processes and,
> when a process goes down and can't be brought back up (disk space, memory,
> etc...), I want the process watcher to revoke its leadership so that a
> different machine can start up the process, do any configuration necessary
> to route clients over to that machine, and assume responsibility as the
> lead for that process.
>
> Often times what happens is a couple physical machines go down and ALL the
> managed processes end up on a single machine.
>
> I'm giving the users control over groups of processes. A group would be
> "amqp broker" or "web server". Once a sys admin comes into work and
> realizes what happened, they are going to want to force the amqp broker
> back to a specific node and force the web server onto some other node. The
> way I've designed it, this means those nodes need to be forced to be the
> leaders for those groups.
>
> Thanks again!
>
>
> On Wed, May 21, 2014 at 1:20 PM, Jordan Zimmerman <
> jor...@jordanzimmerman.com> wrote:
>
>> I’d need more details to answer concretely. But, this sounds like a
>> simple lock. Have the process that wants to be leader acquire an
>> InterProcessMutex.
>>
>> -JZ
>>
>>
>> From: Corey Nolet cjno...@gmail.com
>> Reply: user@curator.apache.org user@curator.apache.org
>> Date: May 21, 2014 at 12:12:35 PM
>> To: user user@curator.apache.org
>> Subject:  Force leader
>>
>>  I have a cluster which is electing a single leader to perform
>> operations on a node until the node is deemed to be unhealthy. At this
>> time, the leader revokes itself and another leader is elected to perform
>> the operations.
>>
>> There are times, however, when I need the ability to force a specific
>> leader. How would I implement something like this? I really don't want to
>> have to cascade through all the other nodes and tell them to revoke their
>> leadership because they will each try to run some initialization upon
>> becoming the leader and that would waste resources.
>>
>> Any ideas?
>>
>> Thanks!
>>
>>
>

Re: Force leader

2014-05-21 Thread Corey Nolet

Jordan,

Thanks for your quick response! So what I am building is a faul-tolerant
framework for linux systems to watch over some number of processes and,
when a process goes down and can't be brought back up (disk space, memory,
etc...), I want the process watcher to revoke its leadership so that a
different machine can start up the process, do any configuration necessary
to route clients over to that machine, and assume responsibility as the
lead for that process.

Often times what happens is a couple physical machines go down and ALL the
managed processes end up on a single machine.

I'm giving the users control over groups of processes. A group would be
"amqp broker" or "web server". Once a sys admin comes into work and
realizes what happened, they are going to want to force the amqp broker
back to a specific node and force the web server onto some other node. The
way I've designed it, this means those nodes need to be forced to be the
leaders for those groups.

Thanks again!

On Wed, May 21, 2014 at 1:20 PM, Jordan Zimmerman <
jor...@jordanzimmerman.com> wrote:

> I’d need more details to answer concretely. But, this sounds like a simple
> lock. Have the process that wants to be leader acquire an InterProcessMutex.
>
> -JZ
>
>
> From: Corey Nolet cjno...@gmail.com
> Reply: user@curator.apache.org user@curator.apache.org
> Date: May 21, 2014 at 12:12:35 PM
> To: user user@curator.apache.org
> Subject:  Force leader
>
>  I have a cluster which is electing a single leader to perform operations
> on a node until the node is deemed to be unhealthy. At this time, the
> leader revokes itself and another leader is elected to perform the
> operations.
>
> There are times, however, when I need the ability to force a specific
> leader. How would I implement something like this? I really don't want to
> have to cascade through all the other nodes and tell them to revoke their
> leadership because they will each try to run some initialization upon
> becoming the leader and that would waste resources.
>
> Any ideas?
>
> Thanks!
>
>

Force leader

2014-05-21 Thread Corey Nolet

I have a cluster which is electing a single leader to perform operations on
a node until the node is deemed to be unhealthy. At this time, the leader
revokes itself and another leader is elected to perform the operations.

There are times, however, when I need the ability to force a specific
leader. How would I implement something like this? I really don't want to
have to cascade through all the other nodes and tell them to revoke their
leadership because they will each try to run some initialization upon
becoming the leader and that would waste resources.

Any ideas?

Thanks!

Re: Query Services Layer Question

2014-05-19 Thread Corey Nolet

Jeff,

Unless you've got multiple different tables with different permissions to
manage for different physical Accumulo users, the connector should probably
be an instance variable in your service. It can be safely shared across all
the reads as long as the Accumulo user configured in the connector has
enough permissions to see what the users of your service need to see. The
communication with the tablet servers doesn't happen until the scanner
creation factory methods are called on connector and an iteration has been
initiated over the them.

On Mon, May 19, 2014 at 10:29 PM, Jeff Schwartz wrote:

> Rookie Question...  I've built a Query Service Layer (QSL) according to
> the documentation from the Accumulo v1.6.0 User Manual.  My question is how
> often should I be getting a Zoo Keeper Instance and Connector to accumulo.
>  For example, here's some psuedo code for a typical service in my QSL.
>
> public void readTable(...) {
> Instance instance = new ZooKeeperInstance(accumuloInstanceName,
> zooServers);
> Connector connector = instance.getConnector(username, passwordToken);
> Scanner scanner = connector.getScanner(tableName, auths);
> Scanner.setRange(range);
> for (Map.Entry entry : scanner) {
>   ...
> }
> scanner.close();
> }
>
> If I do these lines of code for every call in my restful service, then I
> feel like that is generating a lot of extra connections to both zookeeper
> and accumulo.  Additionally, I would assume that that will have a negative
> impact on performance.  Should I cache any connectors or ZooKeeper
> instances?
>
> Any suggestions or best practices would be greatly appreciated.
>
> Thanks in advance.
>
> Sincerely,
> Jeff Schwartz
>

[jira] [Commented] (ACCUMULO-2553) AccumuloFileOutputFormat should be able to support output for multiple tables.

2014-05-19 Thread Corey Nolet (JIRA)


[ 
https://issues.apache.org/jira/browse/ACCUMULO-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001983#comment-14001983
 ] 

Corey Nolet commented on ACCUMULO-2553:
---

The way I'm doing it, the files are put in directories named to the group.
I'm using multiple-outputs to specify the output filename of the group.





> AccumuloFileOutputFormat should be able to support output for multiple tables.
> --
>
> Key: ACCUMULO-2553
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2553
> Project: Accumulo
>  Issue Type: New Feature
>Reporter: Corey J. Nolet
>Assignee: Corey J. Nolet
>Priority: Minor
>
> This may not necessarily be something that would require changes in the 
> AccumuloFileOutputFormat itself. Perhaps the ability to use it with Hadoop's 
> MultipleOutputs is really the solution.
> It would be useful if the user could specify multiple directories where 
> RFiles should be placed and have a mechanism for populating the RFiles in the 
> necessary directories based on a table name or group name. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: unsubscribe

2014-05-17 Thread Corey Nolet

Eric, if your intention is to unsubscribe from the mailing list, you should
send a note to user-unsubscr...@storm.incubator.apache.org[1].

[1] http://mail-archives.apache.org/mod_mbox/incubator-storm-user/


On Fri, May 16, 2014 at 8:58 AM, eric perler  wrote:

> unsubscribe
>

Re: Tracking cardinality in Accumulo

2014-05-16 Thread Corey Nolet

What's the expected size of your unique key set? Thousands? Millions?
Billions?

You could probably use a table structure similar to
https://github.com/calrissian/accumulo-recipes/tree/master/store/metrics-storebut
just have it emit 1's instead of summing them.

I'm thinking maybe your mappings could be like this:
group=anything, type=NAME, name=John(etc...)

perhaps a ColumnQualifierGrouping iterator could be applied at scan time to
add up the cardinalities for the quals over the given time range being
scanned where cardinalities across different time units get aggregated
client side.




On Fri, May 16, 2014 at 5:19 PM, David Medinets wrote:

> Yes, the data has not yet been ingested. I can control the table
> structure; hopefully by integrating (or extending) the D4M schema.
>
> I'm leaning towards using https://github.com/addthis/stream-lib as part
> of the ingest process. Upon start up, existing tables would be analyzed to
> find cardinality. Then as records are ingested, the cardinality would be
> adjusted as needed. I don't yet know how to store the cardinality
> information so that restarting the ingest process doesn't require
> re-processing all the data. Still researching.
>
>
> On Fri, May 16, 2014 at 4:19 PM, Corey Nolet  wrote:
>
>> Can we assume this data has not yet been ingested? Do you have control
>> over the way in which you structure your table?
>>
>>
>>
>> On Fri, May 16, 2014 at 1:54 PM, David Medinets > > wrote:
>>
>>> If I have the following simple set of data:
>>>
>>> NAME John
>>> NAME Jake
>>> NAME John
>>> NAME Mary
>>>
>>> I want to end up with the following:
>>>
>>> NAME 3
>>>
>>> I'm thinking that perhaps a HyperLogLog approach should work. See
>>> http://en.wikipedia.org/wiki/HyperLogLog for more information.
>>>
>>> Has anyone done this before in Accumulo?
>>>
>>
>>
>

Re: MR Data Locality with AccumuloInputFormat?

2014-05-16 Thread Corey Nolet

Has the table been compacted since loading the data?
Hi Russ,

I believe that the AccumuloInputFormat will use the splits on the table
you're reading to generate the MR InputSplits. The InputFormat should be
trying to run the Mappers on the same machine as the tserver serving the
data is located.

If you're only getting a few mappers, adding more splits to your table
should help. As your job runs, you can verify locality using the counters
that your Job creates using the JobTracker/ResourceManger web UI.

On 5/16/14, 1:32 PM, Russ Weeks wrote:

> Hi, folks,
>
> When I execute an MR job with AccumuloInputFormat, are there any
> guarantees about which mappers process which rows? I'm trying to
> minimize crosstalk in my cluster but either I haven't split my table
> properly or I'm expecting too much, because I'm only seeing 1 or 2 nodes
> running MR tasks that should be reading data from tablet servers on 8
> different nodes.
>
> Thanks,
> -Russ
>

Re: Tracking cardinality in Accumulo

2014-05-16 Thread Corey Nolet

Can we assume this data has not yet been ingested? Do you have control over
the way in which you structure your table?

On Fri, May 16, 2014 at 1:54 PM, David Medinets wrote:

> If I have the following simple set of data:
>
> NAME John
> NAME Jake
> NAME John
> NAME Mary
>
> I want to end up with the following:
>
> NAME 3
>
> I'm thinking that perhaps a HyperLogLog approach should work. See
> http://en.wikipedia.org/wiki/HyperLogLog for more information.
>
> Has anyone done this before in Accumulo?
>

Re: [VOTE] Storm Logo Contest - Round 1

2014-05-16 Thread Corey Nolet

Non-binding:
#6 3pts
#4 2pts


On Fri, May 16, 2014 at 1:31 PM, Robert Turner  wrote:

> #6 - 5 pts
>
> Rob Turner.
>
>
> On 15 May 2014 17:28, P. Taylor Goetz  wrote:
>
>> This is a call to vote on selecting the top 3 Storm logos from the 11
>> entries received. This is the first of two rounds of voting. In the first
>> round the top 3 entries will be selected to move onto the second round
>> where the winner will be selected.
>>
>> The entries can be viewed on the storm website here:
>>
>> http://storm.incubator.apache.org/blog.html
>>
>> VOTING
>>
>> Each person can cast a single vote. A vote consists of 5 points that can
>> be divided among multiple entries. To vote, list the entry number, followed
>> by the number of points assigned. For example:
>>
>> #1 - 2 pts.
>> #2 - 1 pt.
>> #3 - 2 pts.
>>
>> Votes cast by PPMC members are considered binding, but voting is open to
>> anyone.
>>
>> This vote will be open until Thursday, May 22 11:59 PM UTC.
>>
>> - Taylor
>>
>
>
>
> --
> Cheers
>Rob.
>

Re: [DISCUSS] Do we want contributors assigning to themselves?

2014-05-16 Thread Corey Nolet

+1 for restoring old behavior.Why wouldn't we allow contributors to help
themselves help the community?


On Thu, May 15, 2014 at 11:13 AM, John Vines  wrote:

> Yes, restore the old behavior
>
>
> On Wed, May 14, 2014 at 4:38 PM, Sean Busbey  wrote:
>
> > We don't have a formal onboarding process for drawing in new
> contributors,
> > but a recent ASF Infra change impacts what I've observed historically.
> >
> > Here's what I've seen historically, more or less:
> >
> > 1) Someone expresses interest in a ticket
> >
> > 2) PMC/committers add them to the list of contributors in jira
> >
> > 3) respond to interest informing person of this change and encouraging
> them
> > to assign the ticket to themselves
> >
> > 4) work happens on ticket
> >
> > 5) review/commit happens eventually
> >
> > 6) If contributor wants, added to website
> >
> > 7) contributor thanked and encouraged to find more tickets to assign to
> > themselves.
> >
> > Due to a request from Spark, the ASF Jira got changed to default to not
> > allow contributors to assign tickets[1].
> >
> > Before I speak for the PMC and file a follow on to change things back, I
> > just wanted a gut check that we like the above as a general approach.
> >
> >
> > [1]: https://issues.apache.org/jira/browse/INFRA-7675
> >
> > --
> > Sean
> >
>

Re: Interesting Comparison

2014-05-13 Thread Corey Nolet

It's hard to make a comparison without knowing exactly how their tests were
written. Especially being from the company of the product being recognized
as "superior". Storm is still young in the larger community and I certainly
think there's place for it to grow.


On Mon, May 12, 2014 at 9:03 PM, Jon Logan  wrote:

> The claims are certainly interesting...I haven't looked through it super
> detailed, but I would definitely keep in mind who is making the claims.
> Looking at it briefly, it looks like something is really wrong, looking at
> their scaling graphs. Without further information, I think it's hard to
> properly analyze their results, especially coming from a competing vendor.
>
>
> I don't know where this 40k figure comes from...coming from IBM's own
> cost-analysis paper, the pricing is more like starting at 500k, and easily
> going 1mil+.
>
> http://public.dhe.ibm.com/common/ssi/ecm/en/ime14024usen/IME14024USEN.PDF
>
>
> It would be interesting if they posted their source code, to see if
> they're doing something silly, or if anyone could rectify their performance
> issues. Otherwise, I think it's fair to assume this is potentially a
> "novice" versus "first-party supported expert" comparisons of
> implementations.
>
>
>
> On Mon, May 12, 2014 at 2:49 PM, Ted Dunning wrote:
>
>>
>> Anybody who has ever only paid 40K$ to IBM for anything should deserve a
>> prize.  That is just the entry fee.
>>
>>
>>
>>
>> On Mon, May 12, 2014 at 7:46 AM, Marc Vaillant 
>> wrote:
>>
>>>  To play devil's advocate, if you believe the stream performance gains,
>>> then the 40k will likely pay for itself in needing to deploy a fraction
>>> of the resources for the same throughput.
>>>
>>> On Mon, May 12, 2014 at 09:02:53AM -0400, John Welcher wrote:
>>> > Hi
>>> >
>>> > Streams also cost 40,000 US while Storm is free.
>>> >
>>> > John
>>> >
>>> >
>>> > On Mon, May 12, 2014 at 3:49 AM, Klausen Schaefersinho <
>>> > klaus.schaef...@gmail.com> wrote:
>>> >
>>> > Hi,
>>> >
>>> > I found some interesting comparison of IBM Stream and Storm:
>>> >
>>> > https://www.ibmdw.net/streamsdev/2014/04/22/streams-apache-storm/
>>> >
>>> > It also includes an interesting comparison between ZeroMQ and the
>>> Netty
>>> > Performance.
>>> >
>>> >
>>> > Cheers,
>>> >
>>> > Klaus
>>> >
>>> >
>>>
>>
>>
>

Re: Interesting Comparison

2014-05-12 Thread Corey Nolet

Interesting that the paper was written by IBM people defending an IBM
product. Not saying that it's biased or anything...

Nathan, I agree that the windowing is better served as a layer on top.
Personally, I appreciate that Storm deals with clustering, distributed
state, fault-tolerance, and threading so that all I have to think about is
processing the tuples received in a bolt (and I don't even have to worry
about my algorithms being thread safe). This is not the case in InfoSphere
Streams.

I also agree that the windowing is better served as an abstraction layer on
top of a generic streams processing platform. I also appreciate this about
Storm- I'm not limited to a single language/API for processing my streams.

Myself and former coworkers of mine were integrating java with InfoSphere
streams a couple years ago and actually found it to be faster in many cases
than the C++ counterparts. I will say InfoSphere's windowing abstractions
are very well thought out. In fact, I've been working on trying to provide
a similar solution here: https://github.com/calrissian/flowmix largely
based on their design. It's still largely experimental but it is holding up
well on the clusters on which it is deployed. As Nathan put it, It's been
holding up well enough for my use cases.




On Mon, May 12, 2014 at 11:28 AM, Nathan Leung  wrote:

> a couple thoughts
>
> 1) IBM streams is certainly more mature, as it's been in development for a
> longer amount of time and storm is not even at release 1.0 yet.  Though I
> am not familiar with SPL, It would also make sense that it's faster to
> implement as it is a higher level abstraction.
>
> 2) Operator fusion will allow more efficiency in passing data between
> steps in your flow, as localOrShuffleGrouping will still need to go over
> disruptor whereas operator fusion from what I understand basically passes
> the pointer directly.  As fast as disruptor is (I've seen benchmarks of
> millions of messages passed / s), it won't be directly passing data to the
> next step (cost: a few instructions).  The downside of this is your flow
> always needs to be created and compiled before you can execute it.
>  Something like a rebalance will require a recompile of your stream.
>  Building a topology dynamically (which is possible in storm, but not a
> feature that is really exposed out of the box) is possible in storm, but
> not in IBM streams.
>
> 3) they took 1 month to optimize storm but I suspect some of this work was
> unnecessary.  Python?  For a benchmark?  Also, uniform message distribution
> by size feels like a premature optimization.  I can understand that they
> would want to explore all avenues to account for a performance difference,
> but in many (most?) practical cases this would not be necessary.  I can
> sympathize on other points.  Tuning the message buffers of storm requires
> pretty specific understanding of the system.  Also if you run out of heap
> and/or have to tune GC, then... yeah.  Not fun.  This would be true for any
> java app though.
>
> 4) I'm not sure they really took language differences seriously enough.
>  I've written certain algorithms in Java that (based on similar algorithms
> that I implemented separately in C++) I would suspect are close to an order
> of magnitude slower just because I ran them in Java.  While I haven't dug
> into this deeply (for example by using an identical algorithm for both Java
> and C++), consider a HashMap indexed by a primitive type.  In Java, these
> are separate objects stored in an array of references.  In C++ these are
> stored sequentially in an array.  C++ allows direct key access in the array
> (as opposed to going through the reference), and is also potentially much
> friendlier with the cache.  Just because the JVM is healthy does not mean
> it's going to perform like C++ for all applications.  I suppose you could
> then argue that for best performance Storm is more or less limited to the
> JVM, but I choose not to consider that point here for brevity.  Note this
> is not to say that it's impossible to write fast code in Java (see
> previously mentioned disruptor).  I would just argue that it's a good bit
> harder.
>
> 5) I'm not sure I buy their argument that application logic costs are
> unlikely to mask the differences in framework performance.  This depends
> very heavily on your application.  If you're hitting external data sources
> a lot (e.g. memcache or database) then that will certainly mask a good
> portion of the difference.  Maybe part of this argument is a C++ vs Java
> difference, in which case I'm somewhat more inclined to agree.
>
> 6) From a business perspective, the question changes from "is it faster?"
> to "what does it cost to support the throughput that we need?" which is a
> very different question.  In many cases storm performs well enough.
>
>
> On Mon, May 12, 2014 at 9:02 AM, John Welcher  wrote:
>
>> Hi
>>
>> Streams also cost 40,000 US while Storm is free.
>>
>> John
>>
>>
>> On Mon,

Re: InputFormat to TAP memcached under couchbase

2014-04-30 Thread Corey Nolet

I wanted to post back here that I had solved the problem I was having with 
the input format in the Sqoop plugin. It was using memebase client to 
perform the tap. Changing this to the couchbase client made it work. I 
figured it'd be useful to have it here in case other users run into the 
same issue. In the meantime, I did put the updated version of the input 
format here:

https://github.com/calrissian/couchbase-toolkit

I've done some work with Couchbase + Elasticsearch + Tinkerpop's Gremlin. 
I'm grabbing snapshots of graphs from Couchbase every hour with the 
InputFormat and it appears to be working well. Though it would definitely 
be faster if I was able to perform filters at the TAP level... I know 
that's a complicated thing to ask for.

On Wednesday, March 12, 2014 9:48:51 PM UTC-4, Corey Nolet wrote:
>
> I *think* i may have isolated this issue to a client version- though it 
> doesn't make sense to me why the sqoop plugin isn't working. I'm going to 
> try upgrading my client libs to the newest version.
>
> On Wednesday, March 12, 2014 4:03:07 PM UTC-4, Corey Nolet wrote:
>>
>> Would it possible for someone to provide me with an effective example on 
>> how to use the TapClient in couchbase/memcached with a couchbase server 
>> installation?
>>
>> I've been banging my head against the wall for days on this. I need to be 
>> able to dump out my couchbase keys/values every hour into HDFS so I can 
>> map/reduce over them. I'm using CDH3u4 and the Sqoop connector is freezing 
>> up when it begins its map/reduce job. I do not have the luxury of updating 
>> to the Sqoop CDH4 version unfortunately but I've seen people complaining of 
>> the same problems with that version.
>>
>> What I've tried is using the TapClient with both the Couchbase libraries 
>> and the spy memcached libraries in java. Even with exponential backoff, I 
>> can't seem to get the TapClient to return a message where I can pull off a 
>> key and a value (it appears I get 'null" for getNextmessage() even with an 
>> appropriate timeout of 5 minutes).
>>
>> What can I do to get this to work? I've been using Couchbase behind 
>> Twitter Storm to help with caching for CEP. I've also been using it as a 
>> real-time query engine of the underlying CEP cache with ElasticSearch for 
>> my customer. If I can't dump the data out to HDFS directly, then I may need 
>> to look at other options. I am trying to stay away from views because I 
>> want to hit memory directly. I'd also like to preserve data locality if 
>> possible (connect directly to memcached or tell couchbase exactly which 
>> node(s) i'd like to retrieve keys from.
>>
>> What are my options here?
>>
>>
>> I'm wondering if BigCouch would allow me to do this effectively.
>>
>> Thanks much!
>>
>>
>> On Monday, March 10, 2014 11:52:57 PM UTC-4, Corey Nolet wrote:
>>>
>>> I recently tried the Sqoop connector for Couchbase 2 and it doesn't 
>>> appear to be working as expected. I have written my own InputFormat here:
>>>
>>>
>>> https://github.com/cjnolet/cloud-toolkit/blob/master/src/main/java/org/calrissian/hadoop/memcached/MemcachedInputFormat.java
>>>
>>> I haven't gotten a chance to test it yet but I wanted to know if MOXI 
>>> would make it hard to get the locality that Im expecting from each of the 
>>> memcached instances. When I connect to a memcached instance (backing 
>>> couchbase) on port 11211, will each of those memcached instances give me 
>>> ALL of the keys in couchbase? or will they only give me the keys that they 
>>> contain separately?
>>>
>>>
>>> Thanks!
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to couchbase+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Flush aggregated data every X seconds

2014-04-24 Thread Corey Nolet

Raphael, in your case it sounds like a "TickSpout" could be useful where
you emit a tuple every n time slices and then sleep until needing to emit
another. I'm not sure how that'd work in a Trident aggregator, however.

I'm not sure if this is something Nathan or the community would approve of,
but I've been writing my own framework for doing sliding/tumbling windows
in Storm that allow aggregations and triggering/eviction by count, time,
and other policies like "when the time difference between the first item
and the last item in a window is less than x". The bolts could easily be
ripped out for doing your own aggregations.

It's located here: https://github.com/calrissian/flowbox

It's very much so in the proof of concept stage. My other requirement (and
the reason I cared so much to implement this) was that the rules need to be
dynamic and the topology needs to be static as to make the best use of
resources while users are defining that they need.

On Thu, Apr 24, 2014 at 11:27 PM, Raphael Hsieh wrote:

> Is there a way in Storm Trident to aggregate data over a certain time
> period and have it flush the data out to an external data store after that
> time period is up ?
>
> Trident does not have the functionality of Tick Tuples yet, so I cannot
> use that. Everything I've been researching leads to believe that this is
> not possible in Storm/Trident, however this seems to me to be a fairly
> standard use case of any streaming map reduce library.
>
> For example,
> If I am receiving a stream of integers
> I want to aggregate all those integers over a period of 1 second, then
> persist it into an external datastore.
>
> This is not in order to count how much it will add up to over X amount of
> time, rather I would like to minimize the read/write/updates I do to said
> datastore.
>
> There are many ways in order to reduce these variables, however all of
> them force me to modify my schema in ways that are unpleasant. Also, I
> would rather not have my final external datastore be my scratch space,
> where my program is reading/updating/writing and checking to make sure that
> the transaction id's line up.
> Instead I want that scratch work to be done separately, then the final
> result stored into a final database that no longer needs to do constant
> updating.
>
> Thanks
> --
> Raphael Hsieh
>
>
>
>

Re: Accumulo and OSGi

2014-04-23 Thread Corey Nolet

+1 for slf4j.


On Wed, Apr 23, 2014 at 10:51 AM, Josh Elser  wrote:

> I'd love to see us move to slf4j. Hadoop is in the middle of a proposal
> about this too which sounds good to me.
>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-
> dev/201404.mbox/%3CCA%2B4kjVv7N2dRR5rmdFHCpBx-K3yT7YRLs0Dvrvdjsn3iChUsEA%
> 40mail.gmail.com%3E
>
>
> On 4/23/14, 10:33 AM, Geoffry Roberts wrote:
>
>> If I were to pitch in on this,  how would it work?  and what logger?  Do
>> I submit patches?  Is slf4j the target?
>>
>>
>> On Wed, Apr 23, 2014 at 9:45 AM, Sean Busbey > > wrote:
>>
>> yes, there are also some bits using commons-logging. I think we
>> managed to scrub out java.util.logging.
>>
>>
>> On Wed, Apr 23, 2014 at 8:39 AM, Geoffry Roberts
>> mailto:threadedb...@gmail.com>> wrote:
>>
>> I thought I'd check in.
>>
>> After some encouragement from this group, I found some time and
>> now have an Accumulo client running in OSGi (Felix).  It's
>> rather primitive, at this juncture, in that it is little more
>> than a wrap job.  I was, however, forced to hack Zookeeper to
>> get things to work.  Zookeeper needed to import an additional
>> package.  I used the servicemix bundle for Hadoop.
>>
>> Josh, You asked if there was anything that could be done
>> upstream to make osgification go better.  One thing, and it's
>> not a huge deal, but getting everything on the same logging
>> library would be nice.  So far, I see both log4j and slf4j.  Are
>> there more?
>>
>>
>>
>> On Thu, Apr 10, 2014 at 12:49 PM, Russ Weeks
>> mailto:rwe...@newbrightidea.com>>
>> wrote:
>>
>> On Thu, Apr 10, 2014 at 7:18 AM, Geoffry Roberts
>> mailto:threadedb...@gmail.com>>
>> wrote:
>>
>> You say the community would be well-accepting of
>> bundling up the Accumulo client.  If that's the case,
>> I'd like to hear from them.
>>
>>
>> +1!
>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>>
>>
>>
>> --
>> Sean
>>
>>
>>
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>

Re: 551 JIRA Tickets Over 2 Years Old

2014-04-21 Thread Corey Nolet

+1 I thought "proposal" would be good enough to convey the message. "Wont
fix" is confusing and I could see possible contributors being starred away
by it.
On Apr 21, 2014 1:04 PM, cjno...@gmail.com wrote:

> +1
> On Apr 21, 2014 11:47 AM, "John Vines"  wrote:
>
>> what about just changing them from being improvements to wishes?
>>
>>
>> On Mon, Apr 21, 2014 at 9:26 AM, Bill Havanki > >wrote:
>>
>> > +1 to using "Won't Fix". "Won't" can mean "won't anytime soon".
>> Labeling as
>> > "someday" or "wishlist" or something sounds great to me. The tickets
>> remain
>> > in JIRA, so they can be resurrected if we change our minds or if an
>> eager
>> > contributor comes along. Nothing is lost.
>> >
>> > I'll look into getting our ASF wiki space established if no one is
>> doing so
>> > already. This isn't the only time it's been proposed for use lately.
>> >
>> > Thanks to David and everybody doing the spring cleaning.
>> >
>> >
>> > On Mon, Apr 21, 2014 at 1:07 AM, Sean Busbey 
>> wrote:
>> >
>> > > What do we want Jira to represent? I prefer it when projects use Jira
>> as
>> > a
>> > > work queue. If a feature request hasn't gotten interest in 2 years,
>> it's
>> > > very unlikely it will suddenly jump to the top of our priority list.
>> > >
>> > > I'm all for suggesting that requestors work on a patch and offering
>> > > feedback to guide them. But if there isn't someone willing to do the
>> > work,
>> > > the ticket is effectively wontfix. We should make sure there's a
>> comment
>> > > that explains that we're open to a feature if someone comes forward
>> to do
>> > > the work. We could also add a label so it's easier for the interested
>> to
>> > > find them.
>> > >
>> > > There is a cost to keeping these defunct tickets around. Old, untended
>> > > tickets discourage new participants. They make us look unresponsive
>> and
>> > > they represent noise for those trying to look at what's going on.
>> > >
>> > > We do need a place for ideas we find interesting but don't have
>> resources
>> > > to handle yet. Many projects request that feature requests start on
>> the
>> > > mailing list to gauge interest. We could just do that, though the mail
>> > > archive is neither super easy to search nor a convenient point of
>> > > reference.
>> > >
>> > > Maybe this would be a good use of our ASF wiki space?
>> > >
>> > >
>> > > On Sat, Apr 19, 2014 at 3:50 PM, Corey Nolet 
>> wrote:
>> > >
>> > > > I agree. Are those tickets really getting in the way? Maybe they
>> could
>> > be
>> > > > labeled differently to separate them from tech debt, bugs, and other
>> > > active
>> > > > features?
>> > > > On Apr 19, 2014 3:51 PM, "John Vines"  wrote:
>> > > >
>> > > > > Won't fix isn't accurate though. We're not saying we will reject
>> work
>> > > on
>> > > > > them, they're just not a high priority.
>> > > > >
>> > > > >
>> > > > > On Sat, Apr 19, 2014 at 3:03 PM, Christopher > >
>> > > > wrote:
>> > > > >
>> > > > > > Resolving them as "Won't Fix" seems valid to me, if the fact
>> that a
>> > > > > > ticket is open helps us track/manage outstanding work. (The
>> obvious
>> > > > > > question, then, is "does it help in some way?"). They can
>> always be
>> > > > > > re-opened if we decide it's worth doing.
>> > > > > >
>> > > > > > --
>> > > > > > Christopher L Tubbs II
>> > > > > > http://gravatar.com/ctubbsii
>> > > > > >
>> > > > > >
>> > > > > > On Sat, Apr 19, 2014 at 1:05 PM, John Vines 
>> > > wrote:
>> > > > > > > Just because they're old doesn't make them invalid. They're
>> just
>> > > at a
>> > > > > > lower
>> > >

Re: 551 JIRA Tickets Over 2 Years Old

2014-04-21 Thread Corey Nolet

+1
On Apr 21, 2014 11:47 AM, "John Vines"  wrote:

> what about just changing them from being improvements to wishes?
>
>
> On Mon, Apr 21, 2014 at 9:26 AM, Bill Havanki  >wrote:
>
> > +1 to using "Won't Fix". "Won't" can mean "won't anytime soon". Labeling
> as
> > "someday" or "wishlist" or something sounds great to me. The tickets
> remain
> > in JIRA, so they can be resurrected if we change our minds or if an eager
> > contributor comes along. Nothing is lost.
> >
> > I'll look into getting our ASF wiki space established if no one is doing
> so
> > already. This isn't the only time it's been proposed for use lately.
> >
> > Thanks to David and everybody doing the spring cleaning.
> >
> >
> > On Mon, Apr 21, 2014 at 1:07 AM, Sean Busbey 
> wrote:
> >
> > > What do we want Jira to represent? I prefer it when projects use Jira
> as
> > a
> > > work queue. If a feature request hasn't gotten interest in 2 years,
> it's
> > > very unlikely it will suddenly jump to the top of our priority list.
> > >
> > > I'm all for suggesting that requestors work on a patch and offering
> > > feedback to guide them. But if there isn't someone willing to do the
> > work,
> > > the ticket is effectively wontfix. We should make sure there's a
> comment
> > > that explains that we're open to a feature if someone comes forward to
> do
> > > the work. We could also add a label so it's easier for the interested
> to
> > > find them.
> > >
> > > There is a cost to keeping these defunct tickets around. Old, untended
> > > tickets discourage new participants. They make us look unresponsive and
> > > they represent noise for those trying to look at what's going on.
> > >
> > > We do need a place for ideas we find interesting but don't have
> resources
> > > to handle yet. Many projects request that feature requests start on the
> > > mailing list to gauge interest. We could just do that, though the mail
> > > archive is neither super easy to search nor a convenient point of
> > > reference.
> > >
> > > Maybe this would be a good use of our ASF wiki space?
> > >
> > >
> > > On Sat, Apr 19, 2014 at 3:50 PM, Corey Nolet 
> wrote:
> > >
> > > > I agree. Are those tickets really getting in the way? Maybe they
> could
> > be
> > > > labeled differently to separate them from tech debt, bugs, and other
> > > active
> > > > features?
> > > > On Apr 19, 2014 3:51 PM, "John Vines"  wrote:
> > > >
> > > > > Won't fix isn't accurate though. We're not saying we will reject
> work
> > > on
> > > > > them, they're just not a high priority.
> > > > >
> > > > >
> > > > > On Sat, Apr 19, 2014 at 3:03 PM, Christopher 
> > > > wrote:
> > > > >
> > > > > > Resolving them as "Won't Fix" seems valid to me, if the fact
> that a
> > > > > > ticket is open helps us track/manage outstanding work. (The
> obvious
> > > > > > question, then, is "does it help in some way?"). They can always
> be
> > > > > > re-opened if we decide it's worth doing.
> > > > > >
> > > > > > --
> > > > > > Christopher L Tubbs II
> > > > > > http://gravatar.com/ctubbsii
> > > > > >
> > > > > >
> > > > > > On Sat, Apr 19, 2014 at 1:05 PM, John Vines 
> > > wrote:
> > > > > > > Just because they're old doesn't make them invalid. They're
> just
> > > at a
> > > > > > lower
> > > > > > > priority. Closing them for the sake of closing them seems like
> a
> > > bad
> > > > > > idea.
> > > > > > >
> > > > > > > But if they're actually invalid now, that's an entirely
> different
> > > > > notion.
> > > > > > >
> > > > > > > Sent from my phone, please pardon the typos and brevity.
> > > > > > > On Apr 19, 2014 12:42 PM, "David Medinets" <
> > > david.medin...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > > >
&g

Re: 551 JIRA Tickets Over 2 Years Old

2014-04-19 Thread Corey Nolet

I agree. Are those tickets really getting in the way? Maybe they could be
labeled differently to separate them from tech debt, bugs, and other active
features?
On Apr 19, 2014 3:51 PM, "John Vines"  wrote:

> Won't fix isn't accurate though. We're not saying we will reject work on
> them, they're just not a high priority.
>
>
> On Sat, Apr 19, 2014 at 3:03 PM, Christopher  wrote:
>
> > Resolving them as "Won't Fix" seems valid to me, if the fact that a
> > ticket is open helps us track/manage outstanding work. (The obvious
> > question, then, is "does it help in some way?"). They can always be
> > re-opened if we decide it's worth doing.
> >
> > --
> > Christopher L Tubbs II
> > http://gravatar.com/ctubbsii
> >
> >
> > On Sat, Apr 19, 2014 at 1:05 PM, John Vines  wrote:
> > > Just because they're old doesn't make them invalid. They're just at a
> > lower
> > > priority. Closing them for the sake of closing them seems like a bad
> > idea.
> > >
> > > But if they're actually invalid now, that's an entirely different
> notion.
> > >
> > > Sent from my phone, please pardon the typos and brevity.
> > > On Apr 19, 2014 12:42 PM, "David Medinets" 
> > wrote:
> > >
> > >> ACCUMULO-483 <https://issues.apache.org/jira/browse/ACCUMULO-483>,
> for
> > >> example, involves creating a purge locality utility. However, there
> have
> > >> been no comments since Oct 2012. If the feature has not risen in
> > priority
> > >> since then, how will it become more important in the future. Perhaps a
> > >> 'good ideas' page or 'roadmap' page could be added to
> > >> http://accumulo.apache.org/? I don't see a benefit to keeping these
> old
> > >> tickets.
> > >>
> > >>
> > >> On Sat, Apr 19, 2014 at 10:11 AM, Corey Nolet 
> > wrote:
> > >>
> > >> > Some of these tickets still look like very valid feature/integration
> > >> > requests that would still be reasonable to have.
> > >> >
> > >> > See ACCUMULO-74, ACCUMULO-143, ACCUMULO-136, ACCUMULO-211,
> > ACCUMULO-483,
> > >> > ACCUMULO-490, ACCUMULO-508
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Sat, Apr 19, 2014 at 9:54 AM, Mike Drob  wrote:
> > >> >
> > >> > > Deleting tickets is a no-no, but flagging them is certainly fine.
> > >> > > On Apr 19, 2014 12:03 AM, "David Medinets" <
> > david.medin...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > Opps. Sorry, I did my filtering badly. There are 68 tickets
> over 2
> > >> > years
> > >> > > > old.
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://issues.apache.org/jira/browse/ACCUMULO-18?jql=project%20%3D%20ACCUMULO%20AND%20status%20in%20%28Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22%29%20AND%20created%20%3C%3D%20-104w%20ORDER%20BY%20key%20ASC
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Sat, Apr 19, 2014 at 12:01 AM, David Medinets
> > >> > > > wrote:
> > >> > > >
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://issues.apache.org/jira/browse/ACCUMULO-551?jql=project%20%3D%20ACCUMULO%20AND%20created%20%3C%3D%20-104w%20ORDER%20BY%20key%20DESC
> > >> > > > >
> > >> > > > > Is there a technique we can use to curate old tickets? Would
> > anyone
> > >> > > mind
> > >> > > > > if I review them and nominate tickets for closure? I can add a
> > >> > message
> > >> > > > and
> > >> > > > > delete any tickets that don't provoke a response. How useful
> are
> > >> > > tickets
> > >> > > > > that are two years old?
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
>

Re: 551 JIRA Tickets Over 2 Years Old

2014-04-19 Thread Corey Nolet

Some of these tickets still look like very valid feature/integration
requests that would still be reasonable to have.

See ACCUMULO-74, ACCUMULO-143, ACCUMULO-136, ACCUMULO-211, ACCUMULO-483,
ACCUMULO-490, ACCUMULO-508




On Sat, Apr 19, 2014 at 9:54 AM, Mike Drob  wrote:

> Deleting tickets is a no-no, but flagging them is certainly fine.
> On Apr 19, 2014 12:03 AM, "David Medinets" 
> wrote:
>
> > Opps. Sorry, I did my filtering badly. There are 68 tickets over 2 years
> > old.
> >
> >
> >
> https://issues.apache.org/jira/browse/ACCUMULO-18?jql=project%20%3D%20ACCUMULO%20AND%20status%20in%20%28Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22%29%20AND%20created%20%3C%3D%20-104w%20ORDER%20BY%20key%20ASC
> >
> >
> >
> >
> > On Sat, Apr 19, 2014 at 12:01 AM, David Medinets
> > wrote:
> >
> > >
> > >
> >
> https://issues.apache.org/jira/browse/ACCUMULO-551?jql=project%20%3D%20ACCUMULO%20AND%20created%20%3C%3D%20-104w%20ORDER%20BY%20key%20DESC
> > >
> > > Is there a technique we can use to curate old tickets? Would anyone
> mind
> > > if I review them and nominate tickets for closure? I can add a message
> > and
> > > delete any tickets that don't provoke a response. How useful are
> tickets
> > > that are two years old?
> > >
> >
>

Re: [VOTE] Accumulo Blog

2014-04-18 Thread Corey Nolet

I'd like initial posting privileges. Thanks for setting this up!
On Apr 18, 2014 11:23 AM, "Bill Havanki"  wrote:

> Sure thing Dave, happy to.
>
> We need to determine an initial list of people with posting privileges.
> I'll start with Dave and myself. If any other PMC member wants in, just let
> me know by COB eastern time, and I'll add you to the infra ticket to
> establish the blog. Don't worry if you miss out, another infra ticket is
> all it takes to get added. (Or, maybe, if you already have a blog account,
> we can add you.)
>
> Bill H
>
> On Thu, Apr 17, 2014 at 12:27 PM,  wrote:
>
> >
> > This vote passes with eight +1 votes (5 binding, 3 non-binding) and one
> +0
> > vote.
> >
> > Bill H - I think you volunteered to help with the setup. The instructions
> > are located at http://www.apache.org/dev/project-blogs . If you are
> > unable to do this let me know.
> >
> > Thanks,
> >
> > Dave
> >
> > - Original Message -
> >
> > From: dlmar...@comcast.net
> > To: dev@accumulo.apache.org
> > Sent: Sunday, April 13, 2014 8:11:07 PM
> > Subject: [VOTE] Accumulo Blog
> >
> > I have reviewed the feedback from the proposal thread and consolidated it
> > into a set of guidelines for an Accumulo Blog. In accordance with the
> > bylaws
> > this vote will require Lazy Approval to pass and will remain open for 3
> > business days. I'll tally the votes on Thursday morning.
> >
> >
> >
> > 1. The blog will be hosted on the Apache Blogs site[1].
> >
> > 2. The blog will be set up using the instructions at [2] to enable
> > public preview.
> >
> > 3. Proposed blog content will be posted in full-text or link form to
> > the dev mailing list.
> >
> > 4. Blog content requires Lazy Approval votes that are open for at
> > least 3 days.
> >
> > 5. Content may be cross-posted from other sites provided that the
> > content is more than just a link to the other site. The full text of the
> > original article is preferred.
> >
> > 6. Content may be cross-posted to other sites provided that there is a
> > link back to the Accumulo blog site.
> >
> >
> >
> > [1] http://blogs.apache.org/
> >
> > [2] http://www.apache.org/dev/project-blogs
> >
> >
> >
> >
> >
> >
> >
>
>
> --
> // Bill Havanki
> // Solutions Architect, Cloudera Govt Solutions
> // 443.686.9283
>

Timing / orchestration in Trident flows

2014-04-18 Thread Corey Nolet

I'm trying to do some timed/count-based orchestrations of streams in
Storm/Trident. Some of my timing problems include timed or count-based
emissions of tuples from aggregators and tumbling windows (whereby I'm
batching up data and I emit every so often or I emit on every 500th tuple).

I am beginning to play around with using sliding windows with a static
Storm topology that allows me to specify "flows" of data that can cycle in
parallel through the topology so that I can do things like collect into
windows for each grouped set of fields and aggregate counts based on fields
and emit every 5 seconds.



Other things I want to do are more CEP-based like creating a stop-gate
filter where the gate will close when a window fills up with 5 tuples and
the time difference between the first and last tuples is less than or equal
to 2 seconds. A closed gate will basically filter tuples for 10 minutes
until the gate is opened again and the logic repeats.

I noticed a comment at the bottom of the Trident API overview that states "*You
might be wondering- how do you do something like a "windowed join", where
tuples from one side of the join are joined against the last hour of tuples
from the other side of the join. To do this,  you would make use of
partitionPersist and stateQuery. The last hour of tuples from one side of
the join would be stored and rotated in a source of state, keyed by the
join field. Then the stateQuery would do lookups by the join field and
perform the "join".*"

I've been using Storm and streams processing for some time but I'm very new
to some of the concepts underlying Trident like state and batching. Because
of that, the quote above isn't making much sense to me. How would I best
implement the use-cases above using such a paradigm (if it exists?). I've
looked @ previous posts on using CEP in storm and I would certainly be
happy to write my own generic windowing functions but I'd like to get as
much for free out of Trident as possible. For instance, if i write my own
time-emitted, count-evicted tumbling window aggregator, I'm not using the
great aggregation functions already supplied in Trident and it seems like a
hack to me.

Thanks!

Re: [VOTE] Accumulo Blog

2014-04-14 Thread Corey Nolet

+1


On Mon, Apr 14, 2014 at 11:18 AM, Joey Echeverria wrote:

> +1 (non-binding)
>
> --
> Joey Echeverria
> Chief Architect
> Cloudera Government Solutions
>
>
> On Mon, Apr 14, 2014 at 11:16 AM, Josh Elser  wrote:
> > +1
> >
> >
> > On 4/13/14, 8:11 PM, dlmar...@comcast.net wrote:
> >>
> >> I have reviewed the feedback from the proposal thread and consolidated
> it
> >> into a set of guidelines for an Accumulo Blog. In accordance with the
> >> bylaws
> >> this vote will require Lazy Approval to pass and will remain open for 3
> >> business days. I'll tally the votes on Thursday morning.
> >>
> >>
> >>
> >> 1.   The blog will be hosted on the Apache Blogs site[1].
> >>
> >> 2.   The blog will be set up using the instructions at [2] to enable
> >> public preview.
> >>
> >> 3.   Proposed blog content will be posted in full-text or link form
> to
> >> the dev mailing list.
> >>
> >> 4.   Blog content requires Lazy Approval votes that are open for at
> >> least 3 days.
> >>
> >> 5.   Content may be cross-posted from other sites provided that the
> >> content is more than just a link to the other site. The full text of the
> >> original article is preferred.
> >>
> >> 6.   Content may be cross-posted to other sites provided that there
> is
> >> a
> >> link back to the Accumulo blog site.
> >>
> >>
> >>
> >> [1] http://blogs.apache.org/
> >>
> >> [2] http://www.apache.org/dev/project-blogs
> >>
> >>
> >>
> >>
> >>
> >>
> >
>

Re: [PROPOSAL] Accumulo Blog

2014-04-10 Thread Corey Nolet

Chris, would you be in favor of forwarding blog posts to G+ so that it can
still be provided to that community?


On Thu, Apr 10, 2014 at 4:14 PM, Corey Nolet  wrote:

> I'm in favor of full reposts wherever possible. It may be duplication of
> content, but it validates for many that the content has been approved by
> the community. While the content is being republished, I'm still in favor
> of posting a link to the original blog post (if applicable).
>
> I find a blog useful when it's from a reputable source, it's easy to find,
> and what I need is right there. I, personally, wouldn't find it as useful
> if I searched a blog and then had to go somewhere else to find the actual
> content.
>
>
> On Thu, Apr 10, 2014 at 3:54 PM, Bill Havanki 
> wrote:
>
>> I think that would be splendid, Don. :)
>>
>> I'd be happy to help out with getting this set up. I'm in favor of using
>> Apache's blog infrastructure, at least at first, since it's ready to go
>> and
>> explicitly for this purpose. I like the sense of place it provides, vs. a
>> loose topic on G+ / elsewhere.
>>
>> - I'm not a fan of just posting links to articles elsewhere. There should
>> be at least a short, complete passage for each post with a link to the
>> full
>> thing, if not a full repost.
>> - Lazy approval sounds fine to me, since a PMC member has to post the
>> content anyway.
>>
>>
>> On Thu, Apr 10, 2014 at 3:23 PM, Donald Miner > >wrote:
>>
>> > Is this something i can volunteer to help manage if nobody else wants
>> to?
>> > Do things like set it up, collect blog posts from authors, edit them,
>> post
>> > them, manage the draft and vote process, etc.
>> >
>> > Just putting that out there as i see it as a way i can contribute to the
>> > community and i also personally think it is a good idea.
>> >
>> > -d
>> >
>> > > On Apr 10, 2014, at 1:59 PM, Mike Drob  wrote:
>> > >
>> > > Not sure how I feel about the Google+ community. As the PMC, aren't we
>> > > responsible for brand management?
>> > >
>> > >
>> > >> On Thu, Apr 10, 2014 at 1:43 PM, Christopher 
>> > wrote:
>> > >>
>> > >> Personally, I'd find it easier to simply suggest people post to a
>> > >> common Google+ topic/community, when there's something of community
>> > >> interest to blog about, rather than maintain a monolithic blog.
>> > >>
>> > >> There may be others with the same topic/name, but this one is the one
>> > >> I saw first:
>> > >> https://plus.google.com/communities/117836301734017142321
>> > >>
>> > >> --
>> > >> Christopher L Tubbs II
>> > >> http://gravatar.com/ctubbsii
>> > >>
>> > >>
>> > >>> On Thu, Apr 10, 2014 at 12:12 PM,   wrote:
>> > >>> I am proposing a blog for the project to be hosted on the
>> > >> blogs.apache.org site. There was a similar proposal last year on the
>> > dev
>> > >> list [1], but no vote or decision. Apache has a web page with setup
>> > >> instructions [2], which also states that the PMC is responsible for
>> the
>> > >> blog content and for granting write access to the blog. The process
>> for
>> > >> setting up a blog is easy and defined in [2].
>> > >>>
>> > >>>
>> > >>> To move forward I think we need to resolve some items:
>> > >>>
>> > >>> 1. The bylaws don't define how to vote on blog content, but the
>> default
>> > >> vote is in a Lazy Approval fashion, with no defined timeframe. I'm
>> > thinking
>> > >> 3 days. Since the PMC is responsible for the content, should we
>> enforce
>> > >> something different, say, consensus or majority approval from active
>> PMC
>> > >> members over 3 days?
>> > >>>
>> > >>> 2. Guidelines for content. If we accept cross-posts from other
>> sites or
>> > >> blog posts from guest writers (non-contributors, non-committers),
>> what
>> > >> rules should be enforced (PMC is responsible for content)? For any
>> > author,
>> > >> can their employer or employer's products be mentioned?
>> > >>>
>> > >>> 3. Do the articles need to be Apache licensed?
>> > >>>
>> > >>> [1]
>> > >>
>> >
>> http://mail-archives.apache.org/mod_mbox/accumulo-dev/201311.mbox/%3CCAD-fFU%2B7ZqoVGYMzN%3D09dv9fMSv%2BF32XbsMubsw9HTZ6n155rg%40mail.gmail.com%3E
>> > >>> [2] http://www.apache.org/dev/project-blogs
>> > >>
>> >
>>
>>
>>
>> --
>> // Bill Havanki
>> // Solutions Architect, Cloudera Govt Solutions
>> // 443.686.9283
>>
>
>

Re: [PROPOSAL] Accumulo Blog

2014-04-10 Thread Corey Nolet

I'm in favor of full reposts wherever possible. It may be duplication of
content, but it validates for many that the content has been approved by
the community. While the content is being republished, I'm still in favor
of posting a link to the original blog post (if applicable).

I find a blog useful when it's from a reputable source, it's easy to find,
and what I need is right there. I, personally, wouldn't find it as useful
if I searched a blog and then had to go somewhere else to find the actual
content.


On Thu, Apr 10, 2014 at 3:54 PM, Bill Havanki wrote:

> I think that would be splendid, Don. :)
>
> I'd be happy to help out with getting this set up. I'm in favor of using
> Apache's blog infrastructure, at least at first, since it's ready to go and
> explicitly for this purpose. I like the sense of place it provides, vs. a
> loose topic on G+ / elsewhere.
>
> - I'm not a fan of just posting links to articles elsewhere. There should
> be at least a short, complete passage for each post with a link to the full
> thing, if not a full repost.
> - Lazy approval sounds fine to me, since a PMC member has to post the
> content anyway.
>
>
> On Thu, Apr 10, 2014 at 3:23 PM, Donald Miner  >wrote:
>
> > Is this something i can volunteer to help manage if nobody else wants to?
> > Do things like set it up, collect blog posts from authors, edit them,
> post
> > them, manage the draft and vote process, etc.
> >
> > Just putting that out there as i see it as a way i can contribute to the
> > community and i also personally think it is a good idea.
> >
> > -d
> >
> > > On Apr 10, 2014, at 1:59 PM, Mike Drob  wrote:
> > >
> > > Not sure how I feel about the Google+ community. As the PMC, aren't we
> > > responsible for brand management?
> > >
> > >
> > >> On Thu, Apr 10, 2014 at 1:43 PM, Christopher 
> > wrote:
> > >>
> > >> Personally, I'd find it easier to simply suggest people post to a
> > >> common Google+ topic/community, when there's something of community
> > >> interest to blog about, rather than maintain a monolithic blog.
> > >>
> > >> There may be others with the same topic/name, but this one is the one
> > >> I saw first:
> > >> https://plus.google.com/communities/117836301734017142321
> > >>
> > >> --
> > >> Christopher L Tubbs II
> > >> http://gravatar.com/ctubbsii
> > >>
> > >>
> > >>> On Thu, Apr 10, 2014 at 12:12 PM,   wrote:
> > >>> I am proposing a blog for the project to be hosted on the
> > >> blogs.apache.org site. There was a similar proposal last year on the
> > dev
> > >> list [1], but no vote or decision. Apache has a web page with setup
> > >> instructions [2], which also states that the PMC is responsible for
> the
> > >> blog content and for granting write access to the blog. The process
> for
> > >> setting up a blog is easy and defined in [2].
> > >>>
> > >>>
> > >>> To move forward I think we need to resolve some items:
> > >>>
> > >>> 1. The bylaws don't define how to vote on blog content, but the
> default
> > >> vote is in a Lazy Approval fashion, with no defined timeframe. I'm
> > thinking
> > >> 3 days. Since the PMC is responsible for the content, should we
> enforce
> > >> something different, say, consensus or majority approval from active
> PMC
> > >> members over 3 days?
> > >>>
> > >>> 2. Guidelines for content. If we accept cross-posts from other sites
> or
> > >> blog posts from guest writers (non-contributors, non-committers), what
> > >> rules should be enforced (PMC is responsible for content)? For any
> > author,
> > >> can their employer or employer's products be mentioned?
> > >>>
> > >>> 3. Do the articles need to be Apache licensed?
> > >>>
> > >>> [1]
> > >>
> >
> http://mail-archives.apache.org/mod_mbox/accumulo-dev/201311.mbox/%3CCAD-fFU%2B7ZqoVGYMzN%3D09dv9fMSv%2BF32XbsMubsw9HTZ6n155rg%40mail.gmail.com%3E
> > >>> [2] http://www.apache.org/dev/project-blogs
> > >>
> >
>
>
>
> --
> // Bill Havanki
> // Solutions Architect, Cloudera Govt Solutions
> // 443.686.9283
>

Re: Accumulo and OSGi

2014-04-10 Thread Corey Nolet

Geoffry,

Unfortunately, I will not be able to provide a patch on the internet for
the work I've done, yet. I hope some of the resources I've provided can be
a good starting place for you. I can certainly be available to help you
through some of the painful problems that I went through, but there's a lag
between my daily work and my ability to commit the work back to Apache and
I cannot promise when that will happen.

My vote would obvious be yes, I think it would be very useful to have
Accumulo's artifacts be OSGi bundles.

On Thu, Apr 10, 2014 at 10:59 AM, Luk Vervenne  wrote:

> Yes
>
> On 10 Apr 2014, at 16:53,  
> wrote:
>
> > Yes
> >
> > -Original Message-
> > From: Josh Elser [mailto:josh.el...@gmail.com]
> > Sent: Thursday, April 10, 2014 9:32 AM
> > To: user@accumulo.apache.org
> > Subject: Re: Accumulo and OSGi
> >
> >> You say the community would be well-accepting of bundling up the
> >> Accumulo client.  If that's the case, I'd like to hear from them.
> >
> > Yes :)
> >
>
>

Re: Accumulo and OSGi

2014-04-09 Thread Corey Nolet

Geoffry,

Interesting you have Hadoop working in Karaf.  I'm using equinox.

Sure, but are we talking Karaf-specific features here? You just want a
Hadoop Client bundle that works, right? The author of the Karaf-Hadoop
project already worked through the classpath issues so it's not a bad
starting place. Same with the ServiceMix features files.

If I understand you correctly, I only need have Text available.  I'll look
into that. It does answer my question and maybe I can avoid the JAAS
mishmash.

Without needing a connection to the FileSystem, can the JAAS stuff be set
as an optional import?

Close on to 100% of my troubles has always been with getting 3rd party
libraries working.  Accumulo/Hadoop is only the latest round.

You are getting no arguments from me. I've been using Karaf consistently
for a few years now and no doubt most of my time maintaining the services
is spent making sure 3rd party things work (and continue to work after
updates). I've come to expect it at this point.

It's security, scaleability, relationship to Hadoop's HDFS and MR all
conspire to make it attractive.  But creating an uberbundle?

Sure, packaging all the Accumulo classes into a single bundle got me up and
running. I still took time wrapping all of its transitive dependencies in
their own bundles and wiring up the imports/exports, but it works.

As mentioned in previous posts, I think the community would be
well-excepting of a first-class Accumulo bundle. If you are interested in
taking on the work, the ticket I posted previously is open and ready for a
patch.

Both of these were advised by one of the bndtools gurus--neither worked.
 When I did the Import-Package other things broke.

I had to import several nested packages to get mine to work properly.
Which other things broke when you did the Import Package of the JAAS
packaes?

On Wed, Apr 9, 2014 at 11:49 AM, Geoffry Roberts wrote:

> Corey,
>
>
> Interesting you have Hadoop working in Karaf.  I'm using equinox.  It also
> sounds as if I don't need to access HFDS in order to get Accumulo to work
> in OSGi. If I understand you correctly, I only need have Text available.
>  I'll look into that. It does answer my question and maybe I can avoid
> the JAAS mishmash.
>
>
> I have been using OSGi off and on for a few years now. Close on to 100% of
> my troubles has always been with getting 3rd party libraries working.
>  Accumulo/Hadoop is only the latest round.
>
>
> I'm gearing to do some major work based on Accumulo.  It's security,
> scaleability, relationship to Hadoop's HDFS and MR all conspire to make it
> attractive.  But creating an uberbundle? I'm sure I could get it working as
> a proof of concept, but will it play in prime-time?
>
>
> How hard would it be to make a proper Accumulo bundle?  Would the
> community accept it?
>
>
> Thanks
>
>
> So far as my travails with JAAS is concerned, I did this in my bndtools
> *.bndrun file:
>
>
> -runproperties:
> org.osgi.framework.system.packages.extra=com.sun.security.auth.module
>
> I also tried:
> Import-Package: com.sun.security.auth.module
> in my bundle that calls Hadoop.
>
> Both of these were advised by one of the bndtools gurus--neither worked.
>  When I did the Import-Package other things broke.
>
> On Wed, Apr 9, 2014 at 9:46 AM, Corey Nolet  wrote:
>
>> Geoffry,
>>
>> As Josh pointed out, you should only need the Hadoop libraries on the
>> client side to use the Text object. This means you won't have to go through
>> the pain of placing the xml files in your root bundles.
>>
>> Did you try the JAAS export from the packages in your container? Did that
>> help?
>>
>> I agree with your comment that we *should* be able to look at these
>> bundles as black boxes but we're not dealing with true bundles, yet. Once I
>> got my Hadoop client bundle ready to go, I haven't had to touch it (and the
>> Hadoop Karaf project solved most of the import/export guessing work for me
>> already).  For Accumulo, on the other hand, it's up to you if you want to
>> create an uber-bundle with the necessary Accumulo packages or just wrap
>> each one and provide the necessary imports/exports across them. For my
>> needs, I just created one uber-bundle for Accumulo packages and imported
>> the necessary non-Accumulo packages (many of those I had to wrap as well).
>> I didn't find that part too painful on the Accumulo side. While not being
>> the ideal situation from the OSGI side, with lack of an existing bundle
>> artifact, it did solve my problem and I'm actively using both Accumulo and
>> Hadoop in OSGi. My recommendation would be that until we get a proper
>> bundle

Re: Accumulo and OSGi

2014-04-09 Thread Corey Nolet

Geoffry,

As Josh pointed out, you should only need the Hadoop libraries on the
client side to use the Text object. This means you won't have to go through
the pain of placing the xml files in your root bundles.

Did you try the JAAS export from the packages in your container? Did that
help?

I agree with your comment that we *should* be able to look at these bundles
as black boxes but we're not dealing with true bundles, yet. Once I got my
Hadoop client bundle ready to go, I haven't had to touch it (and the Hadoop
Karaf project solved most of the import/export guessing work for me
already).  For Accumulo, on the other hand, it's up to you if you want to
create an uber-bundle with the necessary Accumulo packages or just wrap
each one and provide the necessary imports/exports across them. For my
needs, I just created one uber-bundle for Accumulo packages and imported
the necessary non-Accumulo packages (many of those I had to wrap as well).
I didn't find that part too painful on the Accumulo side. While not being
the ideal situation from the OSGI side, with lack of an existing bundle
artifact, it did solve my problem and I'm actively using both Accumulo and
Hadoop in OSGi. My recommendation would be that until we get a proper
bundle, that solution would certainly work in the short-term.

I believe Josh posted this already but check out [1]. A ready-to-go OSGi
bundle for Accumulo would be useful but the Hadoop client dependency would
need to be wrapped (or exposed as its own bundle). IMO, with proper
documentation this shouldn't be too painful for users. Thoughts?

[1] https://issues.apache.org/jira/browse/ACCUMULO-2518

On Mon, Apr 7, 2014 at 11:27 AM, Geoffry Roberts wrote:

> Ahh,  let me try and address where I might have gone off the linguistic
> reservation.
>
> bndtools -- is an eclipse plugin that is very helpful when developing OSGi
> bundles.  It does a lot of grimy, boilerplate things for you.
>
> inlining -- is where one places dependent *.jar files inside the OSGi
> bundle and therefore on said bundle's class path.  It tends to promote
> bloated bundles--not in the spirit of OSGi--but sometimes necessary.
>
> componentizing -- is the business of converting a class into a component.
>  In the bndtools way of doing things, this can be a easy as annotating a
> class with @Component.
>
> bundle -- You probably know what this is already, but I'll include it for
> good measure.  A bundle is a body of code that is on the same class path,
> and often acts as a service to there bundles.
>
> I don't know what could be done upstream other that making Accumulo's
> client OAGI ready.  Would we like to do that?
>
>
> On Mon, Apr 7, 2014 at 11:02 AM, Josh Elser  wrote:
>
>> You just used a lot of words that don't mean anything to me :)
>>
>> Hopefully you don't have to do much on your own. If there are things we
>> can change upstream to make this process easier, please feel free to let us
>> know.
>>
>>
>> On 4/7/14, 10:55 AM, Geoffry Roberts wrote:
>>
>>> Thanks Josh,
>>>
>>> My container for the moment is equinox, but all should work in Felix as
>>> well.  I've been using bndtools for my other OSGi work so I'm faced with
>>> either annotating the Accumulo Code or wrapping it somehow.  What do you
>>> want to bet I wind up inlining it?  Still, the annotated (read
>>> componentized) approach would be less kloogy.  I hesitate because I'd
>>> wind up maintaining my own code line.
>>>
>>>
>>> On Mon, Apr 7, 2014 at 10:28 AM, Josh Elser >> > wrote:
>>>
>>> On 4/7/14, 10:07 AM, Geoffry Roberts wrote:
>>>
>>> My original question remains: Is the Accumulo Client dependent
>>> on the
>>> Hadoop Client fully?  This determination can be made through
>>> trial and
>>> error.  But I'm looking to leverage OPE (other people's
>>> experience) if
>>> it exists.
>>>
>>>
>>> I thought someone had already said this (but I may be confusing
>>> threads): the Accumulo API uses Text throughout. Hadoop is a
>>> required dependency.
>>>
>>>
>>> In the same spirit, does anyone know if all the following are
>>> required
>>> to run an Accumulo Client? core, fate, start, trace?  If I
>>> attempt to
>>> OSGify, I'm trying to figure how much trouble am I getting into.
>>>
>>>
>>> Yes, that should be about it from within Accumulo. You might need
>>> some other foss dependencies also available, but I'm not aware on
>>> what your "container" (or w/e the proper terminology would be)
>>> provides.
>>>
>>>
>>>
>>>
>>> --
>>> There are ways and there are ways,
>>>
>>> Geoffry Roberts
>>>
>>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Accumulo and OSGi

2014-04-06 Thread Corey Nolet

Geoffrey,

My quick answer is that I needed to adjust my container (Karaf in my case)
to export the JAAS packages because they come in the JRE. Then I needed to
make the hadoop bundle import them.

Also before I forget, Hadoop packages its default xml configurations
(core-site.xml, core-default.xml, etc...) in the default namespace so they
can't be exported. For this reason, you need to make sure they are included
at the root bundle level of any bundles needing to use the client (this
should pertain to the libraries needing the filesystem object, not bundles
that just  need the Text object, for instance...).

That's my quick answer on my phone. Ill look into the rest and provide a
more detailed description of what I did.

I went through this headache too and it would help to capture the solution
somewhere (like this thread)
On Apr 6, 2014 11:18 AM, "Geoffry Roberts"  wrote:

> All,
>
> To what extent does the Accumulo Client rely on the Hadoop Client?  I
> apologize if the question is a bit obtuse.  But I got into dependency weeds
> trying to get the Hadoop Client to work in OSGI.  (See below Hadoop Client
> woes)   I am now wondering if I OSGified Accumulo's client would I
> encounter the same-old-same-old or somehow dodge the bullet.
>
> Would anyone else be interested?
>
> 
> I sat down to do what Corey suggested and took a shot at getting the
> Hadoop Client working in OSGi.  I used the service mix bundles, but alas,
> it seems that somewhere in the dependencies something wants to use JAAS and
> that is stopping the show.  I creating a fragment that exports the required
> package so OSGI can find them--nothing doing; I'm stuck.
>
> If one Googles, one finds JAAS is problematic in OSGi as are a number of
> J2EE technologies.
> 
>
>
> On Tue, Apr 1, 2014 at 11:22 AM, Geoffry Roberts 
> wrote:
>
>> Thank you Corey,
>>
>> I was unaware of the service mix Hadoop client.  It's funny that no one
>> on the Hadoop list ever mentioned it.
>>
>> You say you have 1.4 working in OSGi. Did you do a proper port or just
>> wrap it with something like bnd?   I have Hadoop 2.3.0 so I need to use
>> Accumulo 1.5.1.  I'm glad to hear the bane of split packages has been put
>> asunder.
>>
>> I am using equinox for now.  Most of my legacy code in based on bndtools
>> and I exercise it in equinox.  I've used equinox in the past for production
>> and things went well.
>>
>> I just grabbed the Hadoop core & client from service mix then zookeeper.
>> At the least, they were accepted into my bndtools repository as valid
>> bundles.  I noticed that a number of the usual Hadoop dependencies are in
>> service mix as well so I sense a glimmer of hope. Wrt Accumulo, I think
>> I'll take a stab at either wrapping or porting core, frame, start, and
>> trace to OSGi and see how much trouble I get into.
>>
>> It's either do the above or abandon OSGi altogether for this project.
>>
>> My objective is to persist EMF object graphs into Accumulo.  These graphs
>> were built by others and are based on ISO standards so I need to walk the
>> straight and narrow and not drop a stitch.  I have code that does what I
>> need (persist any graph sight unseen) into MongoDB.  I need to adapt said
>> code to Accumulo. All the above is OSGi based so it would really help if I
>> can keep the Accumulo end of things in OSGi as well.
>>
>> Wish me well
>>
>>
>> On Mon, Mar 31, 2014 at 8:34 PM, Corey Nolet  wrote:
>>
>>> Geoffry,
>>>
>>> What OSGi container are you using currently? The servicemix Hadoop
>>> bundle should get you going with the Hadoop client dependencies at least
>>> [1]. It looks like one of the servicemix guys created a Hadoop ticket for
>>> making bundles of their jars as well [2], though it doesn't look like
>>> there's been any movement on it.
>>>
>>> I recently had to get the CDH3u4 client code working in Karaf. A good
>>> starting place for me was [3], however I did need to make updates to
>>> versions of many of the dependencies to get it functioning as expected. [3]
>>> will get you at least started with dependent bundles and the proper
>>> imports/exports to get it working.
>>>
>>> I've got the Accumulo client running in OSGi. If I recall correctly,
>>> versions 1.4 and above do not split packages across jars so it's really
>>> just a matter of getting the dependencies right. Zookeeper also ships as a
>>> bundle [4].
>>>
>>>
>>> Hope this helps.
>>>
>>> [1]
&

Re: Query LeaderSelection state

2014-04-05 Thread Corey Nolet

Is it possible to have a client use the LeaderSelector and call the
getLeader() method without being a part of the leadership group?


On Sat, Apr 5, 2014 at 9:35 AM, Jordan Zimmerman  wrote:

> LeaderSelector has the methods getLeader() and getParticipants(). These
> don't solve your problem?
>
> -Jordan
>
>
> From: Corey Nolet cjno...@gmail.com
> Reply: user@curator.apache.org user@curator.apache.org
> Date: April 4, 2014 at 10:26:41 PM
> To: user@curator.apache.org user@curator.apache.org
> Subject:  Query LeaderSelection state
>
>  Hello,
>
> I'm trying to use Curator to implement a high-availability mechanism
> whereby a few different systems run Curator's LeaderSelector recipe. If one
> fails, It's expected that one of the other systems will pick up right where
> the other left off. These systems are servers, however, and I need to find
> a way to have a client know which system is the leader at some point. As
> far as I can tell, the recipe does not provide any classes that allow me to
> query the current leader for a specific namespace.
>
>  Thanks!
>
>

Query LeaderSelection state

2014-04-04 Thread Corey Nolet

Hello,

I'm trying to use Curator to implement a high-availability mechanism
whereby a few different systems run Curator's LeaderSelector recipe. If one
fails, It's expected that one of the other systems will pick up right where
the other left off. These systems are servers, however, and I need to find
a way to have a client know which system is the leader at some point. As
far as I can tell, the recipe does not provide any classes that allow me to
query the current leader for a specific namespace.

Thanks!

Re: [VOTE] Accumulo Bylaws - Bylaw Change Changes

2014-04-04 Thread Corey Nolet

+1


On Fri, Apr 4, 2014 at 11:34 AM, Christopher  wrote:

> +1
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Fri, Apr 4, 2014 at 11:33 AM, David Medinets
>  wrote:
> > +1
> >
> >
> > On Fri, Apr 4, 2014 at 11:21 AM, John Vines  wrote:
> >
> >> This is a proposal to change the Bylaw Change action in the bylaws from
> >> Majority Approval to Consensus Approval. This is being requested because
> >> Bylaw changes are a major change to the project and all discussion
> should
> >> be able to be had without  a borderline majority being able to force
> things
> >> through.
> >>
> >> Specifically, it is the following line which shall be changed
> >> Modifying BylawsModifying this document.Majority approvalActive PMC
> >> members7
> >> to
> >>
> >> Modifying BylawsModifying this document.Consensus approvalActive PMC
> >> members
> >> 7
> >>
> >> The current bylaws are visible at
> >>
> >> http://accumulo.apache.org/bylaws.html
> >>
> >> This vote will be open for 7 days, until 11 April 2014, 15:20 UTC.
> >>
> >> Upon successful completion of this vote, the first line of the document
> >> body
> >> will be replaced with "This is version 2 of the bylaws," ( or "This is
> >> version 3 of the bylaws," if the vote to change Code Changes passes) and
> >> the aforementioned line will be changed from Majority Approval to
> Consensus
> >> Approval.
> >>
> >> This vote requires majority approval to pass: at least 3 +1 votes and
> more
> >> +1
> >> than -1's.
> >>
> >> [ ] +1 - "I approve of these proposed bylaw changes and accept them for
> >> the Apache Accumulo project."
> >> [ ] +0 - "I neither approve nor disapprove of these proposed bylaw
> changes,
> >> but accept them for the Apache Accumulo project."
> >> [ ] -1 - "I do not approve of these proposed bylaw changes and do not
> >> accept them for the Apache Accumulo project because..."
> >>
> >> Thank you.
> >>
>

Re: [VOTE] Accumulo Bylaws, vote 2

2014-04-03 Thread Corey Nolet

I finally got a chance to fully read through the bylaws and this email
thread.

+1 to approval for the bylaws. Thanks for writing these up!


On Thu, Apr 3, 2014 at 9:59 PM, Sean Busbey wrote:

> Corey,
>
> Just for clarity, is your +1 to Benson's sentiment or to the Bylaws Vote
> for this thread?
>
> -Sean
>
>
> On Thu, Apr 3, 2014 at 8:58 PM, Corey Nolet  wrote:
>
> > +1
> >
> >
> >
> >
> > On Thu, Apr 3, 2014 at 8:45 PM, Benson Margulies  > >wrote:
> >
> > > If you're all going to go spelunking in the Apache policy docs,
> > > perhaps I can help a bit with context.
> > >
> > > The original HTTPD project developed a very specific set of policies
> > > for controlling  _commits to the code base_. The ballet of
> > > -1/veto/justification comes out of there. The overall foundation
> > > policy is an expectation that all projects will apply that same
> > > approach to commits unless they can state a very good reason to do
> > > something else.
> > >
> > > Contrarywise, releases cannot be vetoed. A -1 is just a -1. No veto.
> > > Justification is polite, but not required. Proceeding in the face of a
> > > -1 is not always a good idea, but the policy envisions it; it
> > > envisions that someone might vote -1 because they _might prefer_ to
> > > wait for some other change. But they can just be outvoted.
> > >
> > > Once you get past commits to the codebase and releases, you're more on
> > > your own in deciding how to decide. The particular case at hand, these
> > > bylaws, is an interesting one.
> > >
> > > People should be really clear about what they mean when they propose
> > > consensus as a process. Yes, a consensus process is a process in which
> > > every member of the community has a veto. However, it is also a
> > > process in which every member of the community feels a grave weight of
> > > responsibility in using that veto. Focussing on the veto in a
> > > consensus process is not a good sign.
> > >
> > > Consensus is a slow, deliberative, process, chosen by communities
> > > which value group cohesion over most everything else. It is also a
> > > process that presumes that there is a _status quo_ which is always
> > > acceptable. The community sticks to the status quo until everyone
> > > involved is ready to accept some change. This approach to
> > > decision-making is pretty hard to apply to a new group trying to chart
> > > a new course.
> > >
> > > It is _not_ foundation policy to expect communities to choose
> > > full-blown consensus as the predominant process. Typically, in my
> > > experience, Apache projects do not do full consensus process. Instead,
> > > they strive to give everyone a voice and seek consensus, but
> > > eventually decide via a majority of some kind. Most of the time, the
> > > first part of that (open discussion) achieves a consensus, so that the
> > > second part of that becomes a formality. However, from time to time,
> > > the community chooses to decide by majority in order to decide. The
> > > touchstone of a healthy community is that the minority feel heard and
> > > not steamrolled.
> > >
> >
>

Re: [VOTE] Accumulo Bylaws, vote 2

2014-04-03 Thread Corey Nolet

+1




On Thu, Apr 3, 2014 at 8:45 PM, Benson Margulies wrote:

> If you're all going to go spelunking in the Apache policy docs,
> perhaps I can help a bit with context.
>
> The original HTTPD project developed a very specific set of policies
> for controlling  _commits to the code base_. The ballet of
> -1/veto/justification comes out of there. The overall foundation
> policy is an expectation that all projects will apply that same
> approach to commits unless they can state a very good reason to do
> something else.
>
> Contrarywise, releases cannot be vetoed. A -1 is just a -1. No veto.
> Justification is polite, but not required. Proceeding in the face of a
> -1 is not always a good idea, but the policy envisions it; it
> envisions that someone might vote -1 because they _might prefer_ to
> wait for some other change. But they can just be outvoted.
>
> Once you get past commits to the codebase and releases, you're more on
> your own in deciding how to decide. The particular case at hand, these
> bylaws, is an interesting one.
>
> People should be really clear about what they mean when they propose
> consensus as a process. Yes, a consensus process is a process in which
> every member of the community has a veto. However, it is also a
> process in which every member of the community feels a grave weight of
> responsibility in using that veto. Focussing on the veto in a
> consensus process is not a good sign.
>
> Consensus is a slow, deliberative, process, chosen by communities
> which value group cohesion over most everything else. It is also a
> process that presumes that there is a _status quo_ which is always
> acceptable. The community sticks to the status quo until everyone
> involved is ready to accept some change. This approach to
> decision-making is pretty hard to apply to a new group trying to chart
> a new course.
>
> It is _not_ foundation policy to expect communities to choose
> full-blown consensus as the predominant process. Typically, in my
> experience, Apache projects do not do full consensus process. Instead,
> they strive to give everyone a voice and seek consensus, but
> eventually decide via a majority of some kind. Most of the time, the
> first part of that (open discussion) achieves a consensus, so that the
> second part of that becomes a formality. However, from time to time,
> the community chooses to decide by majority in order to decide. The
> touchstone of a healthy community is that the minority feel heard and
> not steamrolled.
>

Re: Accumulo and OSGi

2014-03-31 Thread Corey Nolet

Geoffry,

What OSGi container are you using currently? The servicemix Hadoop bundle
should get you going with the Hadoop client dependencies at least [1]. It
looks like one of the servicemix guys created a Hadoop ticket for making
bundles of their jars as well [2], though it doesn't look like there's been
any movement on it.

I recently had to get the CDH3u4 client code working in Karaf. A good
starting place for me was [3], however I did need to make updates to
versions of many of the dependencies to get it functioning as expected. [3]
will get you at least started with dependent bundles and the proper
imports/exports to get it working.

I've got the Accumulo client running in OSGi. If I recall correctly,
versions 1.4 and above do not split packages across jars so it's really
just a matter of getting the dependencies right. Zookeeper also ships as a
bundle [4].

Hope this helps.

[1]
http://mvnrepository.com/artifact/org.apache.servicemix.bundles/org.apache.servicemix.bundles.hadoop-core/
[2] https://issues.apache.org/jira/browse/HADOOP-8446
[3] https://github.com/jbonofre/karaf-hadoop
[4] https://issues.apache.org/jira/browse/ZOOKEEPER-425

On Mon, Mar 31, 2014 at 11:37 AM, Geoffry Roberts wrote:

> Luk,
>
> Thanks for the link, but I am a bit lost.  wso2 offers middleware,
> apparently you believe this will help my situation.  If it's not too much,
> can you expand?
>
>
> On Mon, Mar 31, 2014 at 11:13 AM, Luk Vervenne  wrote:
>
>> osgi... see wso2.com
>>
>> On 31 Mar 2014, at 16:58, Geoffry Roberts  wrote:
>>
>> > All,
>> >
>> > I have a project for which Accumulo it appears will serve well.
>>  However, I have a significant amount of code I want to leverage that runs
>> in OSGi.  I don't need for Accumulo itself to be OSGi based but the
>> Accumulo client yes.  I see that the Accumulo client uses all the
>> dependencies of the Hadoop client and therefore is not OSGi ready at this
>> time.  The Hadoop client certainly doesn't do OSGi--I don't think it can
>> even spell it :-)--and attempting to make it so starts turning into a sure
>> path to a long sojourn through dependency hell.  I know, I've tried.
>> >
>> > Nonetheless, I would like to ask: Is there any interest in the Accumulo
>> world of having an OSGi based client for this otherwise very appealing
>> database?
>> >
>> > Thanks mucho
>> > --
>> > There are ways and there are ways,
>> >
>> > Geoffry Roberts
>>
>>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Couchbase Sqoop Data Locality question

2014-03-18 Thread Corey Nolet

Matt,

Sure, your point is definitely valid for moving data from one completely 
separate distributed system to another. It is definitely not optimal in 
cases where I am using Couchbase as distributed cache on the same nodes as 
my Hadoop cluster. In fact, one of the main powers of Hadoop is its ability 
to maintain knowledge of locality and pass that info down to the map/reduce 
layer so that mappers can be scheduled on nodes closest to the data. 
Network is worlds slower than memory- if I can have a mapper on each node 
just pulling data from it's local Couchbase tap instead of hitting the 
network at all, then I'm in a much better position.

I'd also say the same for Couchbase proper- if I could hash the data my way 
so that I can control on which node it ends up, i'd be in a better position 
with my use of the distributed cache. I want to do streams processing but 
give my users the abilty to query the cache using elasticsearch. From what 
I've looked at in the Couchbase Java Client, I can fill in an interface to 
determine which VBucket a key should end up in but I'd have to recompile 
the client in order to use my specialized hashing function. I don't mind 
doing this, but again I have no way to find out which node will host that 
vbucket. Your hashing solution works when I want to guarantee always an 
even distribution, but I don't always want to guarantee that (or maybe I 
know better about what a more useful even distribution may  look like based 
on my domain's use-cases than Couchbase does based on its auto-sharding).

In my environment, I'm using Couchbase as a mutability layer on top of 
Hadoop because my data can change quite frequently for a period of time 
until it's considered immutable and I can vet the data into Accumulo via 
map/reduce job. For this use case, the Sqoop plugin just adds an extra step 
of having to write a file in HDFS and then map/reduce over the file- to put 
the data somewhere else. It also adds storage overhead. I ripped out the 
CouchbaseInputFormat from the Sqoop plugin github project. I don't know why 
the version of the Sqoop plugin that works with CDH3 uses Membase client to 
perform the TAP but for some reason I could not get that to work in 
Couchbase 2.x. I changed that to use CouchbaseClient instead of Membase and 
it works fine. I've now got an InputFormat that's correctly pushing the 
data directly to Accumulo but it's based entirely on the network. It would 
definitely benefit from having locality and not wasting precious network 
bandwidth. I'm not an Erlang developer so I don't think pointing me 
directly to an Erlang method would be useful to me- though I know in my 
past experience with Couchbase that some of the methods have been exposed 
via a remote "eval" function (or maybe Erlang does this automatically?). Is 
it possible to use that eval to ask Couchbase on which nodes a vbucket is 
being hosted? It's a function that I'd need to call once during the 
initialization of the inputformat.

Thanks again Matt!

On Thursday, March 13, 2014 1:38:39 PM UTC-4, Matt Ingenthron wrote:
>
>  Hi Corey,
>
>   From: Corey Nolet >
> Reply-To: "couc...@googlegroups.com " <
> couc...@googlegroups.com >
> Date: Wednesday, March 12, 2014 8:57 PM
> To: "couc...@googlegroups.com " 
> 
> >
> Subject: Couchbase Sqoop Data Locality question
>  
>   Hello, 
>
>  I'm looking through the source code on github for the couchbase hadoop 
> connector. If I'm understanding correctly, the code that generates the 
> splits takes all the possible VBuckets and breaks them up into groups based 
> on the expected number of mappers set by Sqoop. This means that no matter 
> what, even if a mapper is scheduled on a couchbase node, the reads from the 
> dump are ALWAYS going to be sent over the network instead of possibly 
> pulled from the local node's memory and just funneled into the mapper 
> sitting on that local node.
>   
>  
>  This is true, however…
>
>  Hadoop is designed to run across a cluster of systems distributed on a 
> network.  Couchbase, similarly is designed to be run distributed across 
> systems.  So while you're describing a possible optimization, it's not 
> something that's part of either one right now.
>
>  Actually, the way sqoop runs is ideal for most deployments since it 
> gives us great throughput by splitting/distributing the job.
>
>
>  Looking further into the code in the java couchbase client, I'm seeing a 
> class called "VBucketNodeLocator" which has a method getServerByIndex(int 
> k). If I understand this method, it's allowing me to look up the server 
> that holds the vbucket number k. Is this correct?  If it is correct, would 
> it

Updating state in a route.

2014-03-18 Thread Corey Nolet

I am working on a throttling system for my ingest that will check to see if
my upstream JMS broker is backed up to a particular threshold and, if it is
backed up, begin to route messages to disk instead of sending them to the
database.

I'm wondering the best way to implement this using the correct EIP
paradigms. I'll need to periodically (maybe every 2 minutes) check on the
state of the broker and I was thinking of implementing a quartz route that
would check the state and send a message into my main ingest routes with a
special header (like throttle=true/false).

I was thinking my main ingest routes can maintain their own state and
further set some type of header like "shouldThrottle" on each of their
messages so that I can use content based routing to determine where the
messages should go:

(header.get(shouldThrottle) == true) ? to('disk') : to('database')

Does this seem like a reasonable solution or am I going about this the
wrong way? I know EIP is all about trying to keep state in the streams
instead of the processors. I'm not sure if there's a better solution than
having the "state checker" send a message to the main ingest processors so
they can set their own state.


Thanks much!

Camel HDFS Component Documentation

2014-03-18 Thread Corey Nolet

I am currently looking to use the camel hdfs component so that I can have
my camel route output to a sequence file. I dug around the documentation
but didn't see a good way (or any useful examples) of how to format my
exchange so that the output would be correclty written to the sequence
file.

I dug through the code and found this line:

Object key = exchange.getIn().getHeader(HdfsHeader.KEY.name());

I need to manually put put this into the header of my exchanges, right?

Thanks!

Re: Couchbase Sqoop Data Locality question

2014-03-13 Thread Corey Nolet

It appears that method only returns the server at some index in the array. 
Is there not any way to find what server is responsible for a vbucket?

On Thursday, March 13, 2014 12:57:29 AM UTC-4, Corey Nolet wrote:
>
> Hello,
>
> I'm looking through the source code on github for the couchbase hadoop 
> connector. If I'm understanding correctly, the code that generates the 
> splits takes all the possible VBuckets and breaks them up into groups based 
> on the expected number of mappers set by Sqoop. This means that no matter 
> what, even if a mapper is scheduled on a couchbase node, the reads from the 
> dump are ALWAYS going to be sent over the network instead of possibly 
> pulled from the local node's memory and just funneled into the mapper 
> sitting on that local node.
>
> Looking further into the code in the java couchbase client, I'm seeing a 
> class called "VBucketNodeLocator" which has a method getServerByIndex(int 
> k). If I understand this method, it's allowing me to look up the server 
> that holds the vbucket number k. Is this correct?  If it is correct, would 
> it make sense for this to be used in the getSplits() method in the 
> CouchbaseInputFormat so that the splits for the vbuckets can be grouped by 
> the server in which they live? I agree that it may not make sense for many 
> who have their couchbase cluster separate from their hadoop cluster.. but 
> it's a SIGNIFICANT optimization for those who have the two co-located.
>
> Any thoughts?
>
>
> Thanks!
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to couchbase+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Couchbase Sqoop Data Locality question

2014-03-12 Thread Corey Nolet

Hello,

I'm looking through the source code on github for the couchbase hadoop 
connector. If I'm understanding correctly, the code that generates the 
splits takes all the possible VBuckets and breaks them up into groups based 
on the expected number of mappers set by Sqoop. This means that no matter 
what, even if a mapper is scheduled on a couchbase node, the reads from the 
dump are ALWAYS going to be sent over the network instead of possibly 
pulled from the local node's memory and just funneled into the mapper 
sitting on that local node.

Looking further into the code in the java couchbase client, I'm seeing a 
class called "VBucketNodeLocator" which has a method getServerByIndex(int 
k). If I understand this method, it's allowing me to look up the server 
that holds the vbucket number k. Is this correct?  If it is correct, would 
it make sense for this to be used in the getSplits() method in the 
CouchbaseInputFormat so that the splits for the vbuckets can be grouped by 
the server in which they live? I agree that it may not make sense for many 
who have their couchbase cluster separate from their hadoop cluster.. but 
it's a SIGNIFICANT optimization for those who have the two co-located.

Any thoughts?


Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to couchbase+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Changing hash function

2014-03-12 Thread Corey Nolet

I've seen a lot of posts on Google mentioning that couchbase gives the 
ability to change the underlying hash algorithm that is used. How would I 
do this? I'm using the java client.

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to couchbase+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: InputFormat to TAP memcached under couchbase

2014-03-12 Thread Corey Nolet

I *think* i may have isolated this issue to a client version- though it 
doesn't make sense to me why the sqoop plugin isn't working. I'm going to 
try upgrading my client libs to the newest version.

On Wednesday, March 12, 2014 4:03:07 PM UTC-4, Corey Nolet wrote:
>
> Would it possible for someone to provide me with an effective example on 
> how to use the TapClient in couchbase/memcached with a couchbase server 
> installation?
>
> I've been banging my head against the wall for days on this. I need to be 
> able to dump out my couchbase keys/values every hour into HDFS so I can 
> map/reduce over them. I'm using CDH3u4 and the Sqoop connector is freezing 
> up when it begins its map/reduce job. I do not have the luxury of updating 
> to the Sqoop CDH4 version unfortunately but I've seen people complaining of 
> the same problems with that version.
>
> What I've tried is using the TapClient with both the Couchbase libraries 
> and the spy memcached libraries in java. Even with exponential backoff, I 
> can't seem to get the TapClient to return a message where I can pull off a 
> key and a value (it appears I get 'null" for getNextmessage() even with an 
> appropriate timeout of 5 minutes).
>
> What can I do to get this to work? I've been using Couchbase behind 
> Twitter Storm to help with caching for CEP. I've also been using it as a 
> real-time query engine of the underlying CEP cache with ElasticSearch for 
> my customer. If I can't dump the data out to HDFS directly, then I may need 
> to look at other options. I am trying to stay away from views because I 
> want to hit memory directly. I'd also like to preserve data locality if 
> possible (connect directly to memcached or tell couchbase exactly which 
> node(s) i'd like to retrieve keys from.
>
> What are my options here?
>
>
> I'm wondering if BigCouch would allow me to do this effectively.
>
> Thanks much!
>
>
> On Monday, March 10, 2014 11:52:57 PM UTC-4, Corey Nolet wrote:
>>
>> I recently tried the Sqoop connector for Couchbase 2 and it doesn't 
>> appear to be working as expected. I have written my own InputFormat here:
>>
>>
>> https://github.com/cjnolet/cloud-toolkit/blob/master/src/main/java/org/calrissian/hadoop/memcached/MemcachedInputFormat.java
>>
>> I haven't gotten a chance to test it yet but I wanted to know if MOXI 
>> would make it hard to get the locality that Im expecting from each of the 
>> memcached instances. When I connect to a memcached instance (backing 
>> couchbase) on port 11211, will each of those memcached instances give me 
>> ALL of the keys in couchbase? or will they only give me the keys that they 
>> contain separately?
>>
>>
>> Thanks!
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to couchbase+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: InputFormat to TAP memcached under couchbase

2014-03-12 Thread Corey Nolet

Would it possible for someone to provide me with an effective example on 
how to use the TapClient in couchbase/memcached with a couchbase server 
installation?

I've been banging my head against the wall for days on this. I need to be 
able to dump out my couchbase keys/values every hour into HDFS so I can 
map/reduce over them. I'm using CDH3u4 and the Sqoop connector is freezing 
up when it begins its map/reduce job. I do not have the luxury of updating 
to the Sqoop CDH4 version unfortunately but I've seen people complaining of 
the same problems with that version.

What I've tried is using the TapClient with both the Couchbase libraries 
and the spy memcached libraries in java. Even with exponential backoff, I 
can't seem to get the TapClient to return a message where I can pull off a 
key and a value (it appears I get 'null" for getNextmessage() even with an 
appropriate timeout of 5 minutes).

What can I do to get this to work? I've been using Couchbase behind Twitter 
Storm to help with caching for CEP. I've also been using it as a real-time 
query engine of the underlying CEP cache with ElasticSearch for my 
customer. If I can't dump the data out to HDFS directly, then I may need to 
look at other options. I am trying to stay away from views because I want 
to hit memory directly. I'd also like to preserve data locality if possible 
(connect directly to memcached or tell couchbase exactly which node(s) i'd 
like to retrieve keys from.

What are my options here?

I'm wondering if BigCouch would allow me to do this effectively.

Thanks much!

On Monday, March 10, 2014 11:52:57 PM UTC-4, Corey Nolet wrote:
>
> I recently tried the Sqoop connector for Couchbase 2 and it doesn't appear 
> to be working as expected. I have written my own InputFormat here:
>
>
> https://github.com/cjnolet/cloud-toolkit/blob/master/src/main/java/org/calrissian/hadoop/memcached/MemcachedInputFormat.java
>
> I haven't gotten a chance to test it yet but I wanted to know if MOXI 
> would make it hard to get the locality that Im expecting from each of the 
> memcached instances. When I connect to a memcached instance (backing 
> couchbase) on port 11211, will each of those memcached instances give me 
> ALL of the keys in couchbase? or will they only give me the keys that they 
> contain separately?
>
>
> Thanks!
>

-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to couchbase+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: InputFormat to TAP memcached under couchbase

2014-03-11 Thread Corey Nolet

This kind of answers my question- so if I hit port 11211 and do a tap on 
the underlying memcached instance, I will get all the keys that exist ONLY 
on that memcached instance, correct? Will I get duplicate keys on different 
nodes because of the replicas?

Thanks!


On Tuesday, March 11, 2014 12:10:37 AM UTC-4, Aliaksey Kandratsenka wrote:
>
>
>
>
> On Mon, Mar 10, 2014 at 8:52 PM, Corey Nolet 
> > wrote:
>
>> I recently tried the Sqoop connector for Couchbase 2 and it doesn't 
>> appear to be working as expected. I have written my own InputFormat here:
>>
>>
>> https://github.com/cjnolet/cloud-toolkit/blob/master/src/main/java/org/calrissian/hadoop/memcached/MemcachedInputFormat.java
>>
>> I haven't gotten a chance to test it yet but I wanted to know if MOXI 
>> would make it hard to get the locality that Im expecting from each of the 
>> memcached instances. When I connect to a memcached instance (backing 
>> couchbase) on port 11211, will each of those memcached instances give me 
>> ALL of the keys in couchbase? or will they only give me the keys that they 
>> contain separately?
>>
>
> It looks like you're expecting moxi to support tap. But moxi does not 
> support TAP.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to couchbase+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

InputFormat to TAP memcached under couchbase

2014-03-10 Thread Corey Nolet

I recently tried the Sqoop connector for Couchbase 2 and it doesn't appear 
to be working as expected. I have written my own InputFormat here:

https://github.com/cjnolet/cloud-toolkit/blob/master/src/main/java/org/calrissian/hadoop/memcached/MemcachedInputFormat.java

I haven't gotten a chance to test it yet but I wanted to know if MOXI would 
make it hard to get the locality that Im expecting from each of the 
memcached instances. When I connect to a memcached instance (backing 
couchbase) on port 11211, will each of those memcached instances give me 
ALL of the keys in couchbase? or will they only give me the keys that they 
contain separately?


Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to couchbase+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: locality in underlying memcached

2014-03-08 Thread Corey Nolet

Matt. Thanks for the reply.

Would it be possible to do that with the elasticsearch plugin too? 
milliseconds seem expensive to me but I think my biggest problem is that 
with my current approach, I'm often running into "timed out waiting for 
value" exceptions (even when I set the timeout to like 1 minute). I 
recently updated to 2.5.0 and I'm still getting the exception. There is no 
way to give couchbase a hint on where i'd like it to handle a specifc key? 
Do you have any ideas for how to help with the exception? I also receive a 
"failure on node ..." exception quite frequently too. 



On Friday, March 7, 2014 9:44:05 PM UTC-5, Corey Nolet wrote:
>
> Hello,
>
> I've got a unique use-case where I want to use memcached as a backing 
> cache behind Twitter Storm so that I can place keys and values and have 
> Couchbase index them in views as well as ElasticSearch so that users can 
> query on the data that's being populated from Storm.
>
> Storm has a unique feature that will allow me to guarantee that the same 
> item i'm processing will always be sent to the same node/processor. For 
> instance, if i have documents with keys "A, B, and C" and I have 3 nodes 
> running the same storm topology, I can guarantee that A gets put on the 
> same node, B gets put on the same node, and C gets put on the same node. 
> From what I've read, If I use the MemcachedClient in the couchbase java 
> client API instead of the couchbase client and have each of my storm nodes 
> only talk to their localhost memcached clien, I should be able to guarantee 
> that the keys/values stay on that local node, correct? It's important for 
> me as well because I'm manipulating these keys and values quite frequently. 
> Are there any gotchas I should be aware of when considering this approach?
>
> Moxy will not automatically balance out these keys and put them on foreign 
> nodes as long as my topology doesn't change, correct? If nodes are ever 
> added, both Couchbase and Storm will also be added.
>
> Thanks!
>

-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to couchbase+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

locality in underlying memcached

2014-03-07 Thread Corey Nolet

Hello,

I've got a unique use-case where I want to use memcached as a backing cache 
behind Twitter Storm so that I can place keys and values and have Couchbase 
index them in views as well as ElasticSearch so that users can query on the 
data that's being populated from Storm.

Storm has a unique feature that will allow me to guarantee that the same 
item i'm processing will always be sent to the same node/processor. For 
instance, if i have documents with keys "A, B, and C" and I have 3 nodes 
running the same storm topology, I can guarantee that A gets put on the 
same node, B gets put on the same node, and C gets put on the same node. 
>From what I've read, If I use the MemcachedClient in the couchbase java 
client API instead of the couchbase client and have each of my storm nodes 
only talk to their localhost memcached clien, I should be able to guarantee 
that the keys/values stay on that local node, correct? It's important for 
me as well because I'm manipulating these keys and values quite frequently. 
Are there any gotchas I should be aware of when considering this approach?

Moxy will not automatically balance out these keys and put them on foreign 
nodes as long as my topology doesn't change, correct? If nodes are ever 
added, both Couchbase and Storm will also be added.

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"Couchbase" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to couchbase+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Facet to find possible keys for querying

2014-03-06 Thread Corey Nolet

Thanks for your reply Alex. I have replies inline. 

bq. but you are also putting meta information in the same document -

Correct. My elasticsearch implementation is part of a larger framework. 
Similar to Pig, Hive, Avro and other data model-agnostic frameworks, I pass 
along a small piece metadata with each key/value that gets stored on an 
object. This promotes change of models without breaking the analytics 
processing or view layers.

bq. you will not be able to execute number range queries on the age of 
people

The model I've given in my previous post is actually a little dumbed down. 
The framework has a value normalization system that knows how to turn 
native datatypes into lexicographically sortable strings (fixed length byte 
arrays or strings for longs/ints, etc...). What I'm showing in my previous 
post is simply a hand typed version of the actual data model.

bq. it might be more useful to have a dedicated index for your field 
configuration, which you can query for the usecase you outlined in your 
post. And you have a dedicated index for the data

This solution sounds wonderful! Is there a way I can do this automatically 
in ElasticSearch? I know one of the things I did in my mappings was to 
bifurcate the indexes for each of the tuples so that one index I can do 
exact matches and the other index I can do fuzzy matches (I believe I just 
used one with analyzed and one with not_analyzed). Is this where i'd tell 
it to index all unique tuple key names for me? I agree with you on the 
facets, I'd rather not have to perform an aggregated query on ALL the 
entity types if it's not necessary.

Thanks much!

On Thursday, March 6, 2014 3:59:43 AM UTC-5, Alexander Reelsen wrote:
>
> Hey,
>
> before answering your question, I think the approach of handling your data 
> might be problematic. You are actually mixing two things in your data and 
> your metadata (which is in every document). First the data itself (John 
> Doe, 38 years old), but you are also putting meta information in the same 
> document - maybe it makes more sense to put this data somewhere else (as it 
> hopefully applies for all documents of that type). Also your above approach 
> has another problem, you will not be able to execute number range queries 
> on the age of people, because the value field is configured to be a string 
> - same goes for sorting.
> With that said, it might be more useful to have a dedicated index for your 
> field configuration, which you can query for the usecase you outlined in 
> your post. And you have a dedicated index for the data - splitting those 
> IMO makes a lot of sense. On the other hand I dont know your  data well 
> enough, maybe I am completely wrong.
>
> Back to your original question. If you store a document like the above, 
> and you execute searches on it, the full document always gets returned, not 
> just parts of it. You may want to read into parent-child/nested 
> functionality though (I still do not like that approach).
>
> Facetting can only be done on single fields, so you will not get back the 
> tuple you actually need (you could join them via a script facet, but that 
> seems like another work around) - or again read about parent-child/nested 
> documents (again disliking this, but I guess you know this by now).
>
> One last thing: Its nice to have everything in one query, but dont 
> consider this a must. If two queries solve your problem, it might make more 
> sense.
>
>
> --Alex
>
>
>
>
> On Wed, Mar 5, 2014 at 5:15 AM, Corey Nolet 
> > wrote:
>
>> I forgot to mention, I need the ability for the user to specify they only 
>> care about keys for the entity.type === 'person' (or any type for that 
>> matter).
>>
>>
>> On Tuesday, March 4, 2014 11:13:27 PM UTC-5, Corey Nolet wrote:
>>>
>>> Hello,
>>>
>>> I've got an "entity" document which looks like this:
>>>
>>> {
>>>id: 'id',
>>>type: 'person',
>>>tuples: [
>>>{
>>> key: 'nameFirst',
>>> value: 'john',
>>> type: 'string'
>>> },
>>> key: 'age',
>>> value: '38',
>>> type: 'int'
>>> },
>>> {
>>> key: 'nameLast',
>>> value: 'doe',
>>> }   
>>> ]
>>> }
>>>
>>> The tuples field has been mapped in ElasticSearch as a nested type where 
>>> I provide both analyzed and not_analyzed indices for each of the nested 
>

Re: Accumulo site Bootstrapped

2014-03-05 Thread Corey Nolet

+1 It is much cleaner. Would be nice (and more maintainable) for sub-menus-
specifically on the versions for docs &  packages.


On Wed, Mar 5, 2014 at 8:23 PM, Christopher  wrote:

> +1 for what's been done so far, and for revamped site with 1.6.0 release.
>
> Rollout sub-menus might be nice. That nav bar is pretty busy.
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
>
> On Wed, Mar 5, 2014 at 6:32 PM, Josh Elser  wrote:
> > Def needs a little more TLC, but using something like Bootstrap instead
> of
> > rolling our own is definitely the way to go.
> >
> > Would be happy to help out here -- maybe we can get a revamped site for
> > 1.6.0? That'd be pretty boss.
> >
> >
> > On 3/5/14, 5:40 PM, Bill Havanki wrote:
> >>
> >> Some folks in the IRC room were discussing how nice the Spark [1] and
> Hue
> >> [2] sites look compared to ours. While babysitting integration tests, I
> >> decided to prototype a rework of our site using Twitter Bootstrap [3],
> the
> >> front-end framework that both of those other sites use.
> >>
> >> Here are the pages that I converted.
> >>
> >> * http://people.apache.org/~bhavanki/accumulo-bootstrapped/
> >> *
> >>
> >>
> http://people.apache.org/~bhavanki/accumulo-bootstrapped/notable_features.html
> >> * http://people.apache.org/~bhavanki/accumulo-bootstrapped/source.html
> >>
> >> You can navigate between those pages using the left nav menu, but try
> >> anywhere else and you'll jump out to the production site.
> >>
> >> The pages use Bootstrap's own theme, with only very slight modifications
> >> to
> >> be close to our own theme. (I actually disabled around 90% of
> >> accumulo.css.) I kept the page organization like production, although we
> >> have many other whizbang options with Bootstrap. Some bits I left messy,
> >> like the nav items for the user manuals, but you should get the idea
> >> anyway.
> >>
> >> Beyond just how it looks, Bootstrap gives you many other capabilities,
> >> especially responsive display for mobile and tablets, so there's benefit
> >> to
> >> a switch beyond just pretty looking boxes.
> >>
> >> [1] spark.apache.org
> >> [2] gethue.com
> >> [3] getbootstrap.com
> >>
> >
>

Re: Facet to find possible keys for querying

2014-03-04 Thread Corey Nolet

I forgot to mention, I need the ability for the user to specify they only 
care about keys for the entity.type === 'person' (or any type for that 
matter).

On Tuesday, March 4, 2014 11:13:27 PM UTC-5, Corey Nolet wrote:
>
> Hello,
>
> I've got an "entity" document which looks like this:
>
> {
>id: 'id',
>type: 'person',
>tuples: [
>{
> key: 'nameFirst',
> value: 'john',
> type: 'string'
> },
> key: 'age',
> value: '38',
> type: 'int'
> },
> {
> key: 'nameLast',
> value: 'doe',
> }   
> ]
> }
>
> The tuples field has been mapped in ElasticSearch as a nested type where I 
> provide both analyzed and not_analyzed indices for each of the nested 
> fields (for exact and fuzzy match). What I'm trying to do is find, for each 
> entity's type field, the unique tuple key values along with their 
> associated types.
>
> In other words, I want to write a web service where someone can start 
> typing "n" and I'll return "[{ key:'nameFirst', type:'string'}, { key: 
> 'nameLast', type: 'string' }]" or they could start typing "a" and I'll 
> return "[{ key: 'age', type: 'int' }]. If they don't type anything, I'd 
> like to return the union between the two sets (where it includes nameLast, 
> nameFirst, and age). 
>
> As i'm reading, I'm seeing that this may be done with facets but I know 
> they have some limitations  Is this something that would be possible to do 
> directly? I'm trying to do this all with one fast query if I can.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5c62e98d-3ad9-4a4f-b7c7-5620221c2380%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

< 1 2 3 4 5 >

301 - 400 of 496 matches

Mail list logo