Re: Making a RowCounterIterator

2016-07-15 Thread William Slacum
The iterator in the gist also counts cells/entries/KV pairs, not unique rows. You'll want to have some way to skip to the next row value if you want the count to be reflective of the number of rows being read. On Fri, Jul 15, 2016 at 3:34 PM, Shawn Walker wrote: > My

Re: Unable to import RFile produced by AccumuloFileOutputFormat

2016-07-08 Thread William Slacum
I wonder if the file isn't being decrypted properly. I don't see why it would write out incompatible file versions. On Fri, Jul 8, 2016 at 3:02 PM, Josh Elser wrote: > Interesting! I have not run into this one before. > > You could use `accumulo rfile-info`, but I'd guess

Re: java.lang.NoClassDefFoundError with fields of custom Filter

2016-07-07 Thread William Slacum
You could also shade/relocate dependency classes within the uber/fat jar. It has pitfalls but it is very easy to set up. On Thursday, July 7, 2016, Massimilian Mattetti wrote: > Hi Jim, > > the approach of using namespace from HDFS looks promising. I need to > investigate a

Re: [ANNOUNCE] Fluo 1.0.0-beta-2 is released

2016-01-19 Thread William Slacum
Cool beans, Keith! On Tue, Jan 19, 2016 at 11:30 AM, Keith Turner wrote: > The Fluo project is happy to announce a 1.0.0-beta-2[1] release which is > the > third release of Fluo and likely the final release before 1.0.0. Many > improvements in this release were driven by the

Re: compression of keys for a sequential scan over an inverted index

2015-10-26 Thread William Slacum
Thanks, Jonathan! I've wondered about specific numbers on this topic when dealing with geohashes, so this is a very useful tool. On Sun, Oct 25, 2015 at 11:22 AM, Jonathan Wonders wrote: > I have been able to put some more thought into this over the weekend and > make

Re: Watching for Changes with Write Ahead Log?

2015-10-01 Thread William Slacum
Soup gave a talk about something down this alley: https://www.youtube.com/watch?v=aedejUXWrV0 On Thu, Oct 1, 2015 at 2:58 PM, Keith Turner wrote: > Could possibly use a ThreadLocal containing a SoftReference > > Another place you could possibly put this code instead of in a

Re: Question about configuring the linux niceness of tablet servers?

2015-08-17 Thread William Slacum
By Hadoop do you mean a Yarn NodeManager process? On Mon, Aug 17, 2015 at 4:21 PM, Jeff Kubina jeff.kub...@gmail.com wrote: On each of the processing nodes in our cluster we have running 1) HDFS (datanode), 2) Accumulo (tablet server), and 3) Hadoop. Since Accumulo depends on the HDFS, and

Origin of hive.auto.convert.sortmerge.join.noconditionaltask

2015-08-04 Thread William Slacum
Hi all, I've had some questions from users regarding setting `hive.auto.convert.sortmerge.join.noconditionaltask`. I see, in some documentation from users and vendors, that it is recommended to set this parameter. In neither Hive 0.12 nor 0.14 can I find in HiveConf where this is actually defined

Re: Origin of hive.auto.convert.sortmerge.join.noconditionaltask

2015-08-04 Thread William Slacum
You are correct sir! On Tue, Aug 4, 2015 at 3:42 PM, Josh Elser josh.el...@gmail.com wrote: Might you have meant to send this to u...@hive.apache.org? William Slacum wrote: Hi all, I've had some questions from users regarding setting `hive.auto.convert.sortmerge.join.noconditionaltask

Re: How to control Minor Compaction by programming

2015-07-30 Thread William Slacum
Swap out 1.5 in the previous link for the version you're probably using. Which charts are you looking at for the compactions? Usually it's just the number of compactions currently running for the system. On Thu, Jul 30, 2015 at 7:10 PM, William Slacum wsla...@gmail.com wrote: See http

Re: How to control Minor Compaction by programming

2015-07-30 Thread William Slacum
See http://accumulo.apache.org/1.5/apidocs/org/apache/accumulo/core/client/admin/TableOperations.html#flush%28java.lang.String,%20org.apache.hadoop.io.Text,%20org.apache.hadoop.io.Text,%20boolean%29 for minor compacting (aka flushing) a table via the API. On Thu, Jul 30, 2015 at 5:52 PM, Hai

Re: AccumuloInputFormat with pyspark?

2015-07-15 Thread William Slacum
Look in ConfiguratorBase for how it converts enums to config keys. These are the two methods that are used: /** * Provides a configuration key for a given feature enum, prefixed by the implementingClass * * @param implementingClass * the class whose name will be used as a

Re: Abnormal behaviour of custom iterator in getting entries

2015-06-12 Thread William Slacum
What do you mean by multiple entries? Are you doing something similar to the WholeRowIterator, which encodes all the entries for a given row into a single key value? Are you using any other iterators? In general, calls to `hasTop()`, `getTopKey()` and `getTopValue()` should not change the state

Re: Getting InterruptedException

2015-06-03 Thread William Slacum
What does your code look like? I've seen issues where I have some code of the form: BatchScanner s = connector.createBatchScanner(...); for(Entry e : s) { System.out.println(e); } This usually results in an InterruptedException because the TabletServerBatchReaderIterator doesn't seem to have a

Re: [ANNOUNCE] Fluo 1.0.0-alpha-1 Released

2014-10-09 Thread William Slacum
woohoo Look forward to getting to use this! On Thu, Oct 9, 2014 at 4:54 PM, Corey Nolet cjno...@gmail.com wrote: The Fluo project is happy to announce the 1.0.0-alpha-1 release of Fluo. Fluo is a transaction layer that enables incremental processing on top of Accumulo. It integrates into

Re: Using iterators to generate data

2014-08-30 Thread William Slacum
This comes up a bit, so maybe we should add it to the FAQ (or just have better information about iterators in general). The short answer is that it's usually not recommended, because there aren't strong guarantees about the lifetime of an iterator (so we wouldn't know when to close any resources

Re: Optimal # proxy servers

2014-08-11 Thread William Slacum
Going through the proxy will always be an extra RPC step over using a Java client. Eliminating that step, I think, would net the most benefit. On Mon, Aug 11, 2014 at 12:16 AM, John R. Frank j...@diffeo.com wrote: Josh, Following up on this earlier post about the proxy:

Re: 'scanner closed' error

2014-08-03 Thread William Slacum
I have seen issues if I don't have an explicit close on the batch scanner. When I don't have the close, the gc ends up calling `finalize()` which closes the thread pool. Basically, the work around is to manage the lifetime of the instance yourself, rather than leave it up to fate. On Sun, Aug 3,

Re: Z-Curve/Hilbert Curve

2014-07-24 Thread William Slacum
Quick google search yielded: https://github.com/GeoLatte/geolatte-geom/blob/master/src/main/java/org/geolatte/geom/curve/MortonCode.java On Thu, Jul 24, 2014 at 10:10 AM, THORMAN, ROBERT D rt2...@att.com wrote: Can anyone share a Java method to convert lat/lon (decimal degrees) to Z-Curve

Re: Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values

2014-07-15 Thread William Slacum
[tserver.TabletServer] DEBUG: ScanSess tid 10.0.2.15:44992 8 *0 entries* in 0.01 secs, nbTimes = [6 6 6.00 1] No exceptions otherwise. Really appreciate all the ongoing help. Best, -Mike On Mon, Jul 14, 2014 at 6:40 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: Anything in your Tserver

Re: Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values

2014-07-14 Thread William Slacum
Hi Mike! The Combiner interface is only for aggregating keys within a single row. You can probably get away with implementing your combining logic in a WrappingIterator that reads across all the rows in a given tablet. To do some combine/fold/reduce operation, Accumulo needs the input type to be

Re: Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values

2014-07-14 Thread William Slacum
For a bit of psuedocode, I'd probably make a class that did something akin to: http://pastebin.com/pKqAeeCR I wrote that up real quick in a text editor-- it won't compile or anything, but should point you in the right direction. On Mon, Jul 14, 2014 at 3:44 PM, William Slacum wilhelm.von.cl

Re: Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values

2014-07-14 Thread William Slacum
scan* *root@dev pojo* Best, -Mike On Mon, Jul 14, 2014 at 4:07 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: For a bit of psuedocode, I'd probably make a class that did something akin to: http://pastebin.com/pKqAeeCR I wrote that up real quick in a text editor-- it won't

Re: Iterating/Aggregating/Combining Complex (Java POJO/Avro) Values

2014-07-14 Thread William Slacum
thoughts? What's the best way to debug these? On Mon, Jul 14, 2014 at 5:14 PM, William Slacum wilhelm.von.cl...@accumulo.net mailto:wilhelm.von.cl...@accumulo.net wrote: Ah, an artifact of me just willy nilly writing an iterator :) Any reference to `this.source` should be replaced

Re: Forgot SECRET, how to delete zookeeper nodes?

2014-07-13 Thread William Slacum
If the zookeeper data is gone, your best bet is try and identify which directories under /accumulo/tables points to which tables you had. You can then bulk import the files into a new instance's tables. On Sun, Jul 13, 2014 at 11:54 PM, Vicky Kak vicky@gmail.com wrote: I am not sure if the

Re: Mapreduce output format killing tablet servers

2014-06-25 Thread William Slacum
I had a similar thread going on and am currently rummaging through the batch writer code (as well as pontificating on how the tablet server handles multiple write clients for the tablet). What is your ingest skew like? Is it uniform? How quickly do splits occur? I've seen, at relatively low

Re: BatchWriter woes

2014-06-24 Thread William Slacum
I can try to confirm that, but the monitor isn't showing any failures during ingest. By half dead do you mean the master thinks it is alive, but in actuality it isn't? On Fri, Jun 20, 2014 at 10:32 AM, Keith Turner ke...@deenlo.com wrote: On Thu, Jun 19, 2014 at 11:57 PM, William Slacum

Re: Meaning of in METADATA table [SEC=UNOFFICIAL]

2014-06-24 Thread William Slacum
is a byte used for doing an ordering on rows that share the same prefix. There was a presentation floating around on the specifics of the metadata table at one point. I believe that helps tablet information sort before the last tablet, which is suffixed with '~', to force it to sort after the

Re: How does Accumulo compare to HBase

2014-06-23 Thread William Slacum
I think first and foremost, how has writing your application been? Is it something you can easily onboard other people for? Does it seem stable enough? If you can answer those questions positively, I think you have a winning situation. The big three Hadoop vendors (Cloudera, Hortonworks and MapR)

BatchWriter woes

2014-06-19 Thread William Slacum
I'm finding some ingest jobs I have running in a bit of a sticky sitch: I have a MapReduce job that reads a table, transforms the entries, creates an inverted index, and writes out mutations to two tables. The cluster size is in the tens of nodes, and I usually have 32 mappers running. The batch

Re: [DISCUSS] Should we support upgrading 1.4 - 1.6 w/o going through 1.5?

2014-06-16 Thread William Slacum
How much of this is a standalone utility? I think a magic button approach would be good for this case. On Mon, Jun 16, 2014 at 5:24 PM, Sean Busbey bus...@cloudera.com wrote: In an effort to get more users off of our now unsupported 1.4 release, should we support upgrading directly to 1.6

Re: Unable to load Iterator with setscaniter and setshelliter

2014-06-15 Thread William Slacum
Wouldn't the iterator have to be on the classpath for the JVM that launches the shell command? On Sun, Jun 15, 2014 at 9:02 AM, Vicky Kak vicky@gmail.com wrote: setiter -n MyIterator -p 10 -scan -minc -majc -class com.codebits.d4m.iterator.MyIterator scan The above line fails for me

Re: Unable to load Iterator with setscaniter and setshelliter

2014-06-15 Thread William Slacum
is throwing that Exception? On Jun 15, 2014 8:50 AM, William Slacum wilhelm.von.cl...@accumulo.net wrote: Wouldn't the iterator have to be on the classpath for the JVM that launches the shell command? On Sun, Jun 15, 2014 at 9:02 AM, Vicky Kak vicky@gmail.com wrote: setiter -n

Re: Improving Batchscanner Performance

2014-05-20 Thread William Slacum
By blocking, we mean you have to complete the entire index look up before fetching your records. Conceptually, instead of returning a `CollectionText rows`, return an `IteratorText rows` and consume them in batches as the first look up produces them. That way record look ups can occur in parallel

Re: Delete All Data In Table

2014-05-12 Thread William Slacum
You could save the splits, delete the table, then reapply the splits. On Mon, May 12, 2014 at 9:23 AM, BlackJack76 justin@gmail.com wrote: Besides using the tableOperations to deleteRows or delete the table entirely, what is the fastest way to delete all data in a table? I am currently

Re: Common Big Data Architecture Writeup

2014-04-29 Thread William Slacum
You could do mutations or bulk loading. As long as you can phrase your data in terms of keys and values, you can store it in Accumulo. On Tue, Apr 29, 2014 at 1:48 PM, Geoffry Roberts threadedb...@gmail.comwrote: David started this thread yesterday. Since then I have read everything, I

Re: Write to table from Accumulo iterator

2014-04-25 Thread William Slacum
Our own Keith Turner is trying to make this possible with Accismus ( https://github.com/keith-turner/Accismus). I don't know the current state of it, but I believe it's still in the early stages. I've always been under the impression that launching a scanner or writer from within an iterator, as

Re: Embedded Mutations: Is this kind of thing done?

2014-04-24 Thread William Slacum
Depending on your table schema, you'll probably want to translate an object graph into multiple mutations. On Thu, Apr 24, 2014 at 8:40 PM, David Medinets david.medin...@gmail.comwrote: If the sub-document changes, you'll need to search the values of every Accumulo entry? On Thu, Apr 24,

Re: bulk ingest without mapred

2014-04-08 Thread William Slacum
java.io.FileNotFoundException: File does not exist: bulk/entities_fails/failures sticks out to me. it looks like a relative path. where does that directory exist on your file system? On Tue, Apr 8, 2014 at 9:40 AM, pdread paul.r...@siginttech.com wrote: Hi I interface to an accumulo cloud

Re: bulk ingest without mapred

2014-04-08 Thread William Slacum
The extension is .rf. Are you using an RFile.Writer? On Tue, Apr 8, 2014 at 1:29 PM, pdread paul.r...@siginttech.com wrote: Josh As I had stated in one of my previous posts I am using FileSystem. I am using the code from the MapReduce bulk ingest without the MapReduce. I did feed the

Re: NOT operator in visibility string

2014-03-08 Thread William Slacum
Thanks, Joe! On Fri, Mar 7, 2014 at 2:01 PM, joeferner joe.m.fer...@gmail.com wrote: Submitted the patch here: ACCUMULO-2439 https://issues.apache.org/jira/browse/ACCUMULO-2439 -- View this message in context:

Re: Synchronized Access to ZooCache Causing Threads to Block

2014-02-12 Thread William Slacum
FWIW you can probably avoid the scan by making your insert idempotent aside from the timestamp and let versioning handle deduplication. On Wed, Feb 12, 2014 at 1:19 PM, Ariel Valentin ar...@arielvalentin.comwrote: Sorry but I am not at liberty to be specific about our business problem.

Re: scanner question in regards to columns loaded

2014-01-26 Thread William Slacum
Filters (and more generally, iterators) are executed on the server. There is an option to run them client side. See http://accumulo.apache.org/1.4/apidocs/org/apache/accumulo/core/client/ClientSideIteratorScanner.html Using fetchColumnFamily will return only keys that have specific column family

Re: ISAM file location vs. read performance

2014-01-12 Thread William Slacum
Some data on short circuit reads would be great to have. I'm unsure of how correct the compaction leading to eventual locality postulation is. It seems, to me at least, that in the case of a multi-block file, the file system would eventually try to distribute those blocks rather than leave them

Re: How to remove entire row at the server side?

2013-11-05 Thread William Slacum
If an iterator is only set at scan time, then its logic will only be applied when a client scans the table. The data will persist through major and minor compaction and be visible if you scanned the RFile(s) backing the table. Suppress is the better word in this case. Would you please open a

Re: [DISCUSS] Hadoop 2 and Accumulo 1.6.0

2013-10-23 Thread William Slacum
There wasn't any discussions in those tickets as to what Hadoop 2 provides Accumulo. If we're going to still support 1, then any new features only possible with 2 have to become optional until we ditch support for 1. Is there anything people have in mind, feature wise, that Hadoop 2 would help

Re: Trouble with IntersectingIterator

2013-10-01 Thread William Slacum
That iterator is designed to be used with a sharded table format, where in the index and record each occur within the same row. See the Accumulo examples page http://accumulo.apache.org/1.4/examples/shard.html On Tue, Oct 1, 2013 at 3:35 PM, Heath Abelson habel...@netcentricinc.comwrote: I am

Re: Intersecting Iterators [SEC=UNCLASSIFIED]

2013-08-14 Thread William Slacum
Usually the intersecting iterator is used when you're modeling a document partitioned table. That is, you have relatively few row values compared to the number of documents you're storing (like, on the order of hundreds to millions of documents in a single row). It looks like you have a single row

Re: How to efficiently find lexicographically adjacent records?

2013-08-07 Thread William Slacum
Finding the keys after your hypothetical key is easy, as you can just make it the first key in the range you pass to your Scanner. Since accumulo doesn't do backwards scanning, you might have to consider having two tables or sets of rows, one that sorts lexicographically and the other that sorts

Re: Improving ingest performance [SEC=UNCLASSIFIED]

2013-07-24 Thread William Slacum
There can also be significant overhead in starting a MR job if you're using `-libjars` for distributing your dependencies. This effect is more pronounced as the number of nodes increases. I would recommend looking into the distributed cache (there's a quick description at

Re: Accumulo / HBase migration

2013-07-09 Thread William Slacum
We could also just add a transformation from HFileReader - LocalityGroupReader, since I think HBase's storage model (forgive me if there's a better term) maps pretty well to that. On Tue, Jul 9, 2013 at 2:20 PM, dlmar...@comcast.net wrote: I believe that Brian Loss committed code in 1.5 for a

Re: Preferred method for a client to obtain a connector reference

2013-05-30 Thread William Slacum
There's an almost identical method that, instead of a CharSequence or byte[], takes an AuthorizationToken object. If you're using user/password, use a PasswordToken (I think that's the name of the object). On Thu, May 30, 2013 at 4:00 PM, Newman, Elise enew...@integrity-apps.comwrote: Okay, I

Re: Wikisearch Performance Question

2013-05-21 Thread William Slacum
According to https://issues.apache.org/jira/browse/HADOOP-7823 , it should possible to split bzip2 files in Hadoop 1.1. On Tue, May 21, 2013 at 3:54 PM, Eric Newton eric.new...@gmail.com wrote: The files decompress remarkably fast, too. I seem to recall about 8 minutes on our hardware. I

Iterators returning keys out of scan range

2013-05-01 Thread William Slacum
I was always under the impression there was a check, presumably on the client side, that would end a scan session if a key was returned that was not in the original scan range. Say I scanned my table for the range [A, B], but I had an iterator that returned only keys beginning with C. I would

Re: Iterators returning keys out of scan range

2013-05-01 Thread William Slacum
Sorry guys, I forgot add some methods to the iterator to make it work. http://pastebin.com/pXR5veP6 On Wed, May 1, 2013 at 8:01 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: I was always under the impression there was a check, presumably on the client side, that would end a scan

Re: Iterator name already in use with AccumuloInputFormat?

2013-04-11 Thread William Slacum
And it uses the `IteratorSetting(int height, Class? iterator)` constructor, so the name of the iterator is the class itself. Naming your iterator should be a short term fix. I created ACCUMULO-1267 to make a smarter input format. On Thu, Apr 11, 2013 at 2:22 PM, William Slacum wilhelm.von.cl

Re: [VOTE] accumulo-1.4.3 RC2

2013-03-18 Thread William Slacum
The build hangs in cloudtrace for me on Mac OS 10.7.5, oddly enough on a TSocket creation. I thought it was due to me having Thrift 0.9 installed, but I can't see it getting picked up when I try to build via `mvn -X...`, only thrift-0.6.1. Anyone else run into the same thing? I'm not too worried

Re: [VOTE] accumulo-1.4.3 RC2

2013-03-14 Thread William Slacum
As an aside, do we keep track of the ingest and query rates with each release? I know Josh had a bit of a side project to do it nightly, but it'd be interesting to check whether or not as the project grows, we aren't making noticeable trade offs in performance. On Thu, Mar 14, 2013 at 10:36 AM,

Re: Mappers for Accumulo

2013-03-11 Thread William Slacum
So you want both auto adjusting and not auto adjusting depending on the size of a range? I suppose you could lift the code for doing the adjusting, and do some introspection on the ranges (such as how may tablets do I have in this range?) and apply as necessary. On Mon, Mar 11, 2013 at 4:47 PM,

Re: Running Helloworld from different host

2012-12-21 Thread William Slacum
On your accumulo master, what do you you in your conf/slaves file? On Fri, Dec 21, 2012 at 9:43 AM, Kevin Pauli ke...@thepaulis.com wrote: Hi, I'm trying to get my first Accumulo environment setup to evaluate it. I've got it running within a CentOS VM, and I've setup the helloworld data.

Re: How to store numerics or dates as values in Accumulo?

2012-12-21 Thread William Slacum
Rya is a triple store backed by Accumulo: http://www.deepdyve.com/lp/association-for-computing-machinery/rya-a-scalable-rdf-triple-store-for-the-clouds-7Xh905FY0y On Fri, Dec 21, 2012 at 2:01 PM, Keith Turner ke...@deenlo.com wrote: Take a look at the Typo Lexicoders. A Lexicoder serializes

Re: Satisfying Zookeper dependency when installing Accumulo in CentOS

2012-12-19 Thread William Slacum
Did you set ZOOKEEPER_HOME in the accumulo-env.sh script or your environment? On Wed, Dec 19, 2012 at 2:03 PM, Kevin Pauli ke...@thepaulis.com wrote: I'm trying to install Accumulo in CentOS. I have installed the jdk and hadoop, but can't seem to make Accumulo install happy wrt zookeeper. I

Re: Satisfying Zookeper dependency when installing Accumulo in CentOS

2012-12-19 Thread William Slacum
Nvm you're a step behind where I thought you were at. Turns out I'm of no help :) On Wed, Dec 19, 2012 at 2:06 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: Did you set ZOOKEEPER_HOME in the accumulo-env.sh script or your environment? On Wed, Dec 19, 2012 at 2:03 PM, Kevin Pauli

Re: Reduce task failing on job with error java.lang.IllegalStateException: Keys appended out-of-order

2012-12-06 Thread William Slacum
'col3' sorts lexicographically before 'col16'. you'll either need to encode your numerics or zero pad them. On Thu, Dec 6, 2012 at 9:03 AM, Andrew Catterall catteralland...@googlemail.com wrote: Hi, I am trying to run a bulk ingest to import data into Accumulo but it is failing at the

Re: Performance of table with large number of column families

2012-11-09 Thread William Slacum
That shouldn't be a huge issue. How many rows/partitions do you have? How many do you have to scan to find the specific column family/doc id you want? On Fri, Nov 9, 2012 at 11:26 AM, Anthony Fox adfaccu...@gmail.com wrote: I have a table set up to use the intersecting iterator pattern. The

Re: Performance of table with large number of column families

2012-11-09 Thread William Slacum
9, 2012 at 11:39 AM, William Slacum wilhelm.von.cl...@accumulo.net wrote: That shouldn't be a huge issue. How many rows/partitions do you have? How many do you have to scan to find the specific column family/doc id you want? On Fri, Nov 9, 2012 at 11:26 AM, Anthony Fox adfaccu...@gmail.com

Re: Performance of table with large number of column families

2012-11-09 Thread William Slacum
for both index entries and record entries. Could this be the issue? Each record entry has approximately 30 column qualifiers with data in the value for each. On Fri, Nov 9, 2012 at 11:41 AM, William Slacum wilhelm.von.cl...@accumulo.net wrote: I guess assuming you have 10M possible

Re: Performance of table with large number of column families

2012-11-09 Thread William Slacum
? On Fri, Nov 9, 2012 at 11:49 AM, William Slacum wilhelm.von.cl...@accumulo.net wrote: I'm more inclined to believe it's because you have to search across 10M different rows to find any given column family, since they're randomly, and possibly uniformly, distributed. How many tablets

Re: Performance of table with large number of column families

2012-11-09 Thread William Slacum
the same for the scan I am doing? On Fri, Nov 9, 2012 at 12:02 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: So that means you have roughly 312.5k rows per tablet, which means about 725k column families in any given tablet. The intersecting iterator will work at a row per time, so I

Re: Performance of table with large number of column families

2012-11-09 Thread William Slacum
the impression that this would be really fast since I have a column family bloom filter turned on. Is this not correct? On Fri, Nov 9, 2012 at 12:15 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: When I said smaller of tablets, I really mean smaller number of rows :) My apologies. So

Re: thread safety of IndexedDocIterator

2012-11-05 Thread William Slacum
At one point, Keith had warned me against kicking off threads inside a scan session. Is it possible we could have a discussion on the implications of this? On Mon, Nov 5, 2012 at 11:30 AM, Billie Rinaldi bil...@apache.org wrote: On Mon, Nov 5, 2012 at 11:24 AM, Sukant Hajra

Re: Accumulo Map Reduce is not distributed

2012-11-02 Thread William Slacum
What about the main method that calls ToolRunner.run? If you have 4 jobs being created, then you're calling run(String[]) or runOneTable() 4 times. On Fri, Nov 2, 2012 at 5:21 PM, Cornish, Duane C. duane.corn...@jhuapl.eduwrote: Thanks for the prompt response John! When I say that

Re: Filter Implementation - Accumulo 1.3

2012-10-23 Thread William Slacum
Make sure that the class is available to the the tserver process. This is done by putting the jar containing your class on all nodes under the $ACCUMULO_HOME/lib/ext directory. If you put it under lib/ext, then you won't need to stop and restart the process for the tserver to pick it up. On Tue,

Re: [VOTE] accumulo-1.4.2 RC2

2012-10-22 Thread William Slacum
-1, since I'm running into the rat issue reported by Dave Medinets when running build.sh. On Mon, Oct 22, 2012 at 12:20 PM, Keith Turner ke...@deenlo.com wrote: On Mon, Oct 22, 2012 at 9:52 AM, Josh Elser josh.el...@gmail.com wrote: I agree. If it's not a quick fix, we should just revert the

Re: [VOTE] accumulo-1.4.2 RC2

2012-10-22 Thread William Slacum
22, 2012 at 10:09 PM, Eric Newton eric.new...@gmail.com wrote: Can you identify a file that is missing a license or has an incorrect license? I have run the build on RHEL 6, and Ubuntu 12.04. In what environment does the build fail? -Eric On Mon, Oct 22, 2012 at 10:06 PM, William Slacum

Re: [VOTE] accumulo-1.4.2 RC2

2012-10-22 Thread William Slacum
pass but I don't think the issue on trunk is related to odp files. On Mon, Oct 22, 2012 at 10:34 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: I replied to the thread David made, since I think Billie has run into the same issue. I'm on OSX 10.7.5 and I believe it's docs/src

Re: compressing values returned to scanner

2012-10-01 Thread William Slacum
If you aren't often looking at the data in the value on the tablet server (like in an iterator), you can also pre-compress your values on ingest. On Mon, Oct 1, 2012 at 12:19 PM, Marc Parisi m...@accumulo.net wrote: You could compress the data in the value, and decompress the data upon receipt

Re: compressing values returned to scanner

2012-10-01 Thread William Slacum
: That is exactly my use case (ingest once, serve often, no server-side iterators). And I'm doing pre-compression on ingest. I was just looking to do away with app-level compression code. Not a biggie. Ameet On Mon, Oct 1, 2012 at 3:32 PM, William Slacum wilhelm.von.cl...@accumulo.net

Re: sanity checking application WALogs make sense

2012-09-15 Thread William Slacum
I'm a bit confused as to what you mean if an iterator goes down mid-processing. If it goes down at all, then whatever scope it's running in- minor compaction, major compaction and scan- will most likely go down as well (unless your iterator eats an exception and ignores errors). A WALog shouldn't

Re: Running Accumulo straight from Memory

2012-09-11 Thread William Slacum
Woops- slow innurnet and didn't notice Eric's response. On Tue, Sep 11, 2012 at 9:30 AM, William Slacum wilhelm.von.cl...@accumulo.net wrote: You could mount a RAM disk and point HDFS to it. On Tue, Sep 11, 2012 at 9:02 AM, Moore, Matthew J. matthew.j.mo...@saic.com wrote: Has anyone

Re: Custom Iterators

2012-08-22 Thread William Slacum
An or clause should be able to handle an enumeration of values, as that's supported in a JEXL expression. It would not, however, surprise me if those iterators could not handle multiple rows in a tablet. If you can reproduce that, please file a ticket. There will be a large update occurring to the

Re: Using Accumulo as input to a MapReduce job frequently hangs due to lost Zookeeper connection

2012-08-16 Thread William Slacum
What does your TServer debug log say? Also, are you writing back out to Accumulo? To follow up what Jim said, you can check the zookeeper log to see if max connections is being hit. You may also want to check and see what your max xceivers is set to for HDFS and check your Accumulo and HDFS logs

Re: [External] Re: Problem importing directory to Accumulo table

2012-07-17 Thread William Slacum
Did you configure hadoop to store your HDFS instance/data somewhere other than /tmp? Look up the single node set up in the Hadoop docs. On Tue, Jul 17, 2012 at 12:07 PM, Shrestha, Tejen [USA] shrestha_te...@bah.com wrote: This is the error that was produced. java.io.FileNotFoundException: File

Re: [External] Re: Problem importing directory to Accumulo table

2012-07-17 Thread William Slacum
Also it looks like your app is storing something in /tmp/files, so you may want to make sure that you mean to be looking on your local FS or in HDFS. On Tue, Jul 17, 2012 at 12:27 PM, William Slacum wsla...@gmail.com wrote: Did you configure hadoop to store your HDFS instance/data somewhere

Re: more questions about IndexedDocIterators

2012-07-16 Thread William Slacum
1) The class hierarchy is a little convoluted, but there doesn't seem to be anything necessarily broken about the FamilyIntersectingIterator/IndexedDocIterator that would prevent it from being backported from trunk to a 1.3.x branch. AFAIK the SortedKeyValueIterator interface has remained

Re: Chain Jobs and Accumulo.

2012-07-16 Thread William Slacum
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201203.mbox/%3ccaocnvr0osrawytau7lt+agf0bmmcwfhrgpj8_ga4u6mac2y...@mail.gmail.com%3E It looks like the old API was given a second chance at life and is now being billed as the stable API. On Mon, Jul 16, 2012 at 2:39 PM, Billie J

Re: Chain Jobs and Accumulo.

2012-07-16 Thread William Slacum
mapred was deprecated as of 0.20.0 ( http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/InputFormat.html) :) On Mon, Jul 16, 2012 at 2:49 PM, Juan Moreno jwellington.mor...@gmail.comwrote: The hadoop API is very confusing in that regard. Currently Accumulo runs atop 0.20

Re: Chain Jobs and Accumulo.

2012-07-16 Thread William Slacum
? Would I have to do something as complex as InputFormatBase ? (It's a mammoth class) On Jul 16, 2012 5:53 PM, William Slacum wilhelm.von.cl...@accumulo.net wrote: mapred was deprecated as of 0.20.0 ( http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/InputFormat.html

Re: more questions about IndexedDocIterators

2012-07-15 Thread William Slacum
I'm on a phone, so excuse the lack of info/answers, but #5 is because the IntersectingIterator is essentially a proof of concept piece of code. There's no reason you shouldn't be able to do one term. The Wikipedia example is able to handle single term queries. The code is a bit rough to read, but

Re: java.lang.VerifyError: Cannot inherit from final class

2012-07-14 Thread William Slacum
Looks like the stack trace is finishing up in the Thrift stuff-- I wonder if you have a newer version of Thrift on the client? On Sat, Jul 14, 2012 at 10:33 PM, Josh Elser josh.el...@gmail.com wrote: Can you post some more information about how you're running your program on your Windows client

Re: Cleaning tablet server entries from zookeeper - would it be possible to add percent complete?

2012-07-10 Thread William Slacum
It can take a long time if your tablet server isn't responsive, you're major compacting, or there's some other issue going on in your ecosystem (ie, the NameNode/DataNode has barfed or even ZooKeeper itself has locked up). Check your monitor to see what it's trying to do and also check that HDFS

Re: querying the tablet server for given row (to get locality)?

2012-07-01 Thread William Slacum
A tablet will contain at minimum one row. So, if you shard/partition, eventually your data will grow to the point that each tablet will essentially be one row. On Jul 1, 2012 2:17 PM, Sukant Hajra qn2b6c2...@snkmail.com wrote: I've been considering using distributed messaging service (Akka in my

Re: strategies beyond intersecting iterators?

2012-07-01 Thread William Slacum
By iterator stack I am referring to the Accumulo iterators. Resource sharing among scan sessions is implemented by destroying a user scan session and eventually recreating the iterator stack. The new stack is then seek'd to the last key returned by the entire stack. If you were holding some state,

Re: strategies beyond intersecting iterators?

2012-07-01 Thread William Slacum
The you can think of the Intersecting (and Or) iterator as a tree of merging keys. So, let's assume we have the following index in a given partition. The partition will have the row partitionN. partitionN Bill: 1 partitionN Bill: 2 partitionN Bill: 3 partitionN Josh: 3 partitionN Josh: 4

Re: querying for relevant rows

2012-06-29 Thread William Slacum
You can use a BatchScanner and give it two ranges. It would look something like: ArrayListRange ranges = new ArrayListRange(); ranges.add(new Range(new Key(timestamp1))); ranges.add(new Range(new Key(timestamp2))); BatchScanner bs = con.createBatchScanner(...); //set your iterators and filters

Re: querying for relevant rows

2012-06-29 Thread William Slacum
Oh, did I interpret this wrong? I originally thought all of the timestamps would be enumerated as rows, but after re-reading, I kind of get the idea that the rows are being used as markers in a skip list like fashion. On Fri, Jun 29, 2012 at 11:52 AM, Adam Fuchs afu...@apache.org wrote: You

Re: strategies beyond intersecting iterators?

2012-06-28 Thread William Slacum
You're pretty much on the spot regarding two aspects about the current IntersectingIterator: 1- It's not really extensible (there are hooks for building doc IDs, but you still need the same `partition term: docId` key structure) 2- Its main strength is that it can do the merges of sorted lists of

Re: [External] Re: accumulo init not working

2012-06-18 Thread William Slacum
Did your NameNode start up correctly? If on a local instance, you can verify this by running `jps -lm`. If jps isn't on your path, it should be located in $JAVA_HOME/bin. If the NameNode is not running, check your Hadoop logs. The log you want should have namenode in the file name-- it should

Re: Is it possible to use an iterator to aggregate results of a BatchScanner?

2012-06-11 Thread William Slacum
So, is a global sorting order required of your iterator? That's really the key behavioral difference in terms of output when you're dealing with a Scanner versus a BatchScanner. Please correct me if I'm wrong about assuming you're trying to get a distribution for the column families that appear

Re: how to use CountingIterator to count records?

2012-06-06 Thread William Slacum
You're kind of there. Essentially, you can think of your Scanner's interactions with the TServers as a tree with a height of two. Your Scanner is the root and its children are all of the TServers it needs to interact with. Essentially, the operation you'd want to is sum the number of records each

  1   2   >