Re: Column Scan / table metadata

2013-09-18 Thread David Medinets
How would you define 'modestly-sized tables'? Are you thinking of an absolute number like 100 Billion entries or some number of entries per tablet? Or perhaps a time estimate - like a map-reduce job takes 60 minutes to scan the table? On Wed, Sep 18, 2013 at 2:57 PM, Josh Elser wrote: > There i

Re: [VOTE] Accumulo Instamo Archetype 1.4.4

2013-09-18 Thread Mike Drob
Ah, so one can make use of the other. Makes sense. Is there a source tar to verify? Or is the only vote for maven artifacts? On Sep 17, 2013 11:19 PM, "Josh Elser" wrote: > Well, a knock on how long this has been in the pipeline -- this was > actually begun before the maven-plugin even came abou

Re: Schedule for 1.6.0 release?

2013-09-18 Thread Billie Rinaldi
+1 On Sep 18, 2013 9:36 AM, "Keith Turner" wrote: > We do need to get this settled. What about end of year target for release > date and feature freeze date at end of Oct? > > > On Tue, Aug 27, 2013 at 4:26 PM, Mike Drob wrote: > > > I wanted to revive this conversation, since fall is fast appr

Re: Schedule for 1.6.0 release?

2013-09-18 Thread Corey Nolet
+1 On Sep 18, 2013 5:43 PM, "Mike Drob" wrote: > +1 with reservations. > > 1.5.0 initially planned for an end-of-year release, but that ended up > slipping much later. I'd like us to learn from that experience and come > down much more strictly on the feature freeze this time. > > > On Wed, Sep 1

Re: Schedule for 1.6.0 release?

2013-09-18 Thread Mike Drob
+1 with reservations. 1.5.0 initially planned for an end-of-year release, but that ended up slipping much later. I'd like us to learn from that experience and come down much more strictly on the feature freeze this time. On Wed, Sep 18, 2013 at 2:14 PM, Christopher wrote: > +1 > > -- > Christo

Re: Many locality groups

2013-09-18 Thread Josh Elser
For those curious, I ran some quick benchmarks, scanning over all columns (loc groups) does appear to take a slight hit as your grow the number of locality groups, but it doesn't appear to be too painful: With 1.5.0 (no in-memory maps partitioning): {4lgs => 4.5s, 16 lgs => 4.9s, 32lgs, 5.5s} With

Re: Column Scan / table metadata

2013-09-18 Thread Keith Turner
On Wed, Sep 18, 2013 at 3:25 PM, Josh Elser wrote: > On Wed, Sep 18, 2013 at 3:15 PM, Keith Turner wrote: > > > On Wed, Sep 18, 2013 at 2:42 PM, Devin Pinkston < > devinfpinks...@gmail.com > > >wrote: > > > > > I have been looking through the Accumulo source to try and find the > best > > > way

Re: Column Scan / table metadata

2013-09-18 Thread Josh Elser
On Wed, Sep 18, 2013 at 3:15 PM, Keith Turner wrote: > On Wed, Sep 18, 2013 at 2:42 PM, Devin Pinkston >wrote: > > > I have been looking through the Accumulo source to try and find the best > > way to derive the column structure/metadata of a table. If I have a > table > > > > Metadata in RFile

Re: Column Scan / table metadata

2013-09-18 Thread Keith Turner
On Wed, Sep 18, 2013 at 2:42 PM, Devin Pinkston wrote: > I have been looking through the Accumulo source to try and find the best > way to derive the column structure/metadata of a table. If I have a table > Metadata in RFile contains some info about column families, but not column qualifiers.

Re: Column Scan / table metadata

2013-09-18 Thread Josh Elser
There isn't a reliable way to ascertain the column set for a table via the Accumulo API. Scanning all of the keys in a table would work; however, this quickly becomes too costly to perform for modestly sized tables. An easy way to manage this is to build up the set of columns as part of your "ing

Column Scan / table metadata

2013-09-18 Thread Devin Pinkston
I have been looking through the Accumulo source to try and find the best way to derive the column structure/metadata of a table. If I have a table "sample", and I want to find all the column families/qualifiers, is there a built-in facility in Accumulo to get a list of columns in that table? Or w

Re: [VOTE] Accumulo Instamo Archetype 1.4.4

2013-09-18 Thread Christopher
That's right. The maven plugin is for testing Accumulo projects. The instamo archetype is a skeleton Accumulo project. The maven plugin would be something we'd want to add to future versions of instamo, so that the skeleton project already has a predefined integration test, using the plugin. -- Ch

Re: Combiner Iterator

2013-09-18 Thread Devin Pinkston
Thank you very much, I greatly appreciate it! Devin, The Iterator is the "magic" that Accumulo is doing underneath the hood. The Combiner is a neat construct because it performs this reduction server-side so that your client code doesn't need to. When you use a Combiner with a (Batch)Scanner or

Re: Schedule for 1.6.0 release?

2013-09-18 Thread Christopher
+1 -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Wed, Sep 18, 2013 at 10:36 AM, Keith Turner wrote: > We do need to get this settled. What about end of year target for release > date and feature freeze date at end of Oct? > > > On Tue, Aug 27, 2013 at 4:26 PM, Mike Drob wrote: > >

Re: Many locality groups

2013-09-18 Thread Josh Elser
Neat! Glad to see I wasn't completely off base with some of the complexity numbers I was expecting. I'll pick up my poking and prodding where you left off. Thanks, Keith. On Wed, Sep 18, 2013 at 11:35 AM, Keith Turner wrote: > I ran some test before and after partitioning tablet memory in > A

Re: Many locality groups

2013-09-18 Thread Keith Turner
I ran some test before and after partitioning tablet memory in ACCUMULO-112. I commented on the performance numbers I saw. I checked in the code I used to test. test/src/main/java/org/apache/accumulo/test/IMMLGBenchmark.java Looking back at the test, one thing I did not time was reading all of

Re: Schedule for 1.6.0 release?

2013-09-18 Thread Josh Elser
On Wed, Sep 18, 2013 at 11:16 AM, Sean Busbey wrote: > +1 for "Feature Freeze" means only bug fixes happen after the date, which > implies major code additions and changes are already in place with > appropriate tests. > > I presume on feature freeze date we'll handle step 1 of the proposed git >

Re: [VOTE] Accumulo Instamo Archetype 1.4.4

2013-09-18 Thread Keith Turner
To try this out I added the staging repo to a profile called test144 in my .m2/settings file and ran the following command which generated an instamo directoy. mvn -P test144 archetype:generate -DarchetypeGroupId=org.apache.accumulo -DarchetypeArtifactId=instamo-archetype -DarchetypeVersion=1.4.4

Re: Schedule for 1.6.0 release?

2013-09-18 Thread Sean Busbey
+1 for "Feature Freeze" means only bug fixes happen after the date, which implies major code additions and changes are already in place with appropriate tests. I presume on feature freeze date we'll handle step 1 of the proposed git workflow for releases (branch to 1.6.0-SNAPSHOT and increment ver

Many locality groups

2013-09-18 Thread Josh Elser
I have a use case in which I'm investigating setting a locality group on every column family in a table which has very "dense" rows (many columns appear within the same tablet). When scanning over a single column, I see a slow-down as one might expect (filtering out the columns I don't care about)

Re: Schedule for 1.6.0 release?

2013-09-18 Thread Eric Newton
I'm ok to leave some documentation tasks for the test period, but all the new features should have unit tests, integration tests, and RW tests. On Wed, Sep 18, 2013 at 10:48 AM, Josh Elser wrote: > +1 by the end of the year with adequate time for testing. > > Just so we're all clear (because I

Re: Schedule for 1.6.0 release?

2013-09-18 Thread Josh Elser
+1 by the end of the year with adequate time for testing. Just so we're all clear (because I don't remember where we ended up either), feature freeze means "all features tagged for 1.6.0 must be finished", right? In other words, if I had something planned for 1.6.0 that I haven't started, I need t

Re: Schedule for 1.6.0 release?

2013-09-18 Thread Eric Newton
+1 I absolutely need to have multiple volume support in a release by the end-of-year. On Wed, Sep 18, 2013 at 10:36 AM, Keith Turner wrote: > We do need to get this settled. What about end of year target for release > date and feature freeze date at end of Oct? > > > On Tue, Aug 27, 2013 at 4

Re: Schedule for 1.6.0 release?

2013-09-18 Thread Keith Turner
We do need to get this settled. What about end of year target for release date and feature freeze date at end of Oct? On Tue, Aug 27, 2013 at 4:26 PM, Mike Drob wrote: > I wanted to revive this conversation, since fall is fast approaching. One > reasonable target for a release date might be to

Re: [VOTE] Accumulo Instamo Archetype 1.4.4

2013-09-18 Thread Keith Turner
On Tue, Sep 17, 2013 at 11:10 PM, Mike Drob wrote: > Josh, > > What role does this fill that the Accumulo maven plugin does not? They seem > AFAIK the maven plugin does not provide an example pom and example code to get a user quickly started. This is provided by the archetype. The archetype f

Re: Combiner Iterator

2013-09-18 Thread Josh Elser
Devin, The Iterator is the "magic" that Accumulo is doing underneath the hood. The Combiner is a neat construct because it performs this reduction server-side so that your client code doesn't need to. When you use a Combiner with a (Batch)Scanner or configured on a table, you specify what granula

Re: Combiner Iterator

2013-09-18 Thread Miguel Pereira
So, the value iterator is in the abstract class combiner. The super class of the statscombiner. It automatically passes in the valueiterator to the reduce method. in the findTop() method > Iterator viter = new ValueIterator(getSource()); > topValue = reduce(topKey, viter); > this

Combiner Iterator

2013-09-18 Thread Devin Pinkston
Hello, I am trying to work with the example combiner iterator through java code instead of the jar or shell. My question is how do I pass in the Iterator to the reduce method? Usually I would create a Key Value Iterator, but this requires an Iterator just over the Value, and then the key to be p