Re: Why may "tablet read ahead" take long time? (was: Profile a (batch) scan)

2019-01-15 Thread Adam Fuchs
Hi Maxim, What you're seeing is an artifact of the threading model that Accumulo uses. When you launch a query, Accumulo tablet servers will coordinate RPCs via Thrift in one thread pool (which grows unbounded) and queue up scans (rfile lookups, decryption/decompression, iterators, etc.) in

Re: Major Compactions

2017-12-12 Thread Adam Fuchs
Watch out for ACCUMULO-4578 if you're using --cancel on one of the affected versions (1.7.2 or 1.8.0 or earlier). Adam On Tue, Dec 12, 2017 at 7:57 AM, Mike Walch wrote: > There should be a mention of the --cancel option in the docs. I created a > PR to add it to the 2.0

Re: Key Refactroing

2017-06-21 Thread Adam Fuchs
Sven, You might consider using a combination of AccumuloInputFormat and AccumuloFileOutputFormat in a map/reduce job. The job will run in parallel, speeding up your transformation, the map/reduce framework should help with hiccups, and the bulk load at the end provides a atomic, eventually

Re: Accumulo Seek performance

2016-09-12 Thread Adam Fuchs
cache hit rate was? Adam On Mon, Sep 12, 2016 at 9:14 AM, Josh Elser <josh.el...@gmail.com> wrote: > 5 iterations, figured that would be apparent from the log messages :) > > The code is already posted in my original message. > > Adam Fuchs wrote: > >> Josh, >>

Re: Accumulo Seek performance

2016-09-12 Thread Adam Fuchs
Josh, Two questions: 1. How many iterations did you do? I would like to see an absolute number of lookups per second to compare against other observations. 2. Can you post your code somewhere so I can run it? Thanks, Adam On Sat, Sep 10, 2016 at 3:01 PM, Josh Elser

Re: Adding a second node to a single node installation

2016-05-23 Thread Adam Fuchs
Cyrille, I think you're going to have to do a few things to get the nodes to act as a cluster: 1. How would you like your Zookeeper cluster to be set up? If you're planning on using a one-node Zookeeper instance on the master node, then you may need to turn zookeeper off on your second node and

Re: Accumulo folks at Hadoop Summit San Jose

2016-05-19 Thread Adam Fuchs
I'll be there. Adam On Thu, May 19, 2016 at 11:01 AM, Josh Elser wrote: > Out of curiosity, are there going to be any Accumulo-folks at Hadoop > Summit in San Jose, CA at the end of June? > > - Josh >

Re: Three day Fluo Common Crawl test

2016-01-12 Thread Adam Fuchs
Nice writeup! Thanks, Adam On Tue, Jan 12, 2016 at 11:59 AM, Keith Turner wrote: > We just completed a three day test of Fluo using Common Crawl data that > went pretty well. > > http://fluo.io/webindex-long-run/ > > >

Re: Trigger for Accumulo table

2015-12-08 Thread Adam Fuchs
I totally agree, Christopher. I have also run into a few situations where it would have been nice to have something like a mutation listener hook. Particularly in generating indexing and stats records. Adam On Tue, Dec 8, 2015 at 5:59 PM, Christopher wrote: > In the

Re: Can't connect to Accumulo

2015-12-04 Thread Adam Fuchs
Mike, I suspect if you get rid of the "localhost" line and restart Accumulo then you will get services listening on the non-loopback IPs. Right now you have some of your processes accessible outside your VM and others only accessible from inside, and you probably have two tablet servers when you

Re: Quick question re UnknownHostException

2015-11-13 Thread Adam Fuchs
Josef, If these are intermittent failures, you might consider turning on the watcher [1] to automatically restart your processes. This should keep your cluster from atrophying over time. You'll still have to take administrative action to fix the DNS problem, but your availability should be

Re: pre-sorting row keys vs not pre-sorting row keys

2015-10-29 Thread Adam Fuchs
I bet what you're seeing is more efficient batching in the latter case. BatchWriter goes through a binning phase whenever it fills up half of its buffer, binning everything in the buffer into tablets. If you give it sorted data it will probably be binning into a subset of the tablets instead of

Re: Is there a sensible way to do this? Sequential Batch Scanner

2015-10-28 Thread Adam Fuchs
Rob, I would use something like an IteratorChain [1] and fead it Scanner.iterator() objects. If you setReadaheadThreshold(0) on the scanner then calling Scanner.iterator() is a fairly lightweight operation, and you'll be able to plop a bunch of iterators into the IteratorChain so that they are

Re: Why the Range not find the data

2015-10-14 Thread Adam Fuchs
Try using the Range.exact(...) and Range.prefix(...) helper methods to generate specific ranges. Key.followingKey(...) might also be helpful. Cheers, Adam On Wed, Oct 14, 2015 at 9:59 AM, Lu Qin wrote: > In my accumulo cluster ,the table has this data: > 0 cf0:cq0 []v0

Re: What is the optimal number of tablets for a large table?

2015-10-13 Thread Adam Fuchs
Here are a few other factors to consider: 1. Tablets may not be used uniformly. If there is a temporal element to the row key then writes and reads may be skewed to go to a portion of the tablets. If some tables are big but more archival in nature then they will skew the stats as well. It's

Re: Watching for Changes with Write Ahead Log?

2015-10-01 Thread Adam Fuchs
;> 2) Checked that against changes I know my system has made >> >> 3) If my system is not the originator of the change, update >> internal state to reflect the change. >> >> >> >> Examples of state I may need to update include an ElasticSearch i

Re: Document Partitioned Indexing

2015-09-30 Thread Adam Fuchs
Hi Tom, Sqrrl uses a document-distributed indexing strategy extensively. On top of the reasons you mentioned, we also like the ability to explicitly structure our index entries in both information content and sort order. This gives us the ability to do interesting things like build custom indexes

Re: Watching for Changes with Write Ahead Log?

2015-09-29 Thread Adam Fuchs
Jon, You might think about putting a constraint on your table. I think the API for constraints is flexible enough for your purpose, but I'm not exactly sure how you would want to manage the results / side effects of your observations. Adam On Tue, Sep 29, 2015 at 5:41 PM, Parise, Jonathan

Re: Presplitting tables for the YCSB workloads

2015-09-18 Thread Adam Fuchs
You could cat the splits to a temp file, then use the -sf option of createtable, piping the command to the accumulo shell's standard in: $ echo "createtable ycsb_tablename -sf /tmp/ycsb_splits.txt" | accumulo shell -u user -p password -z instancename zoohost:2181 Not sure if the row keys are

Re: RowID design and Hive push down

2015-09-14 Thread Adam Fuchs
Hi Roman, What's the used for in your previous key design? As I'm sure you've figured out, it's generally a bad idea to have a fully unique hash in your key, especially if you're trying to support extensive secondary indexing. What we've found is that it's not just the size of the key but also

Re: Accumulo: "BigTable" vs. "Document Model"

2015-09-04 Thread Adam Fuchs
Sqrrl uses a hybrid approach. For records that are relatively static we use a compacted form, but for maintaining aggregates and for making updates to the compacted form documents we use a more explicit form. This is done mostly through iterators and a fairly complex type system. The big trade-off

rya incubator proposal

2015-09-03 Thread Adam Fuchs
Hey Accumulopers, I thought you might like to know that the Rya project just proposed to join the incubator. Rya is a mature project that supports RDF on top of Accumulo. Feel free to join the discussion or show support on the incubator general list. Cheers, Adam

Re: Questions on intersecting iterator and partition ids

2015-07-13 Thread Adam Fuchs
Vaibhav, I have included some answers below. Cheers, Adam On Mon, Jul 13, 2015 at 11:19 AM, vaibhav thapliyal vaibhav.thapliyal...@gmail.com wrote: Dear all, I have the following questions on intersecting iterator and partition ids used in document sharded indexing: 1. Can we run a

Re: micro compaction

2015-06-09 Thread Adam Fuchs
I think this might be the same concept as in-mapper combining, but applied to data being sent to a BatchWriter rather than an OutputCollector. See [1], section 3.1.1. A similar performance analysis and probably a lot of the same code should apply here. Cheers, Adam [1]

Re: Change column family

2015-05-26 Thread Adam Fuchs
This can also be done with a row-doesn't-fit-into-memory constraint. You won't need to hold the second column in-memory if your iterator tree deep copies, filters, transforms and merges. Exhibit A: [HeapIterator-derivative] |_ | \

Re: Accumulo Summit 2015

2015-05-04 Thread Adam Fuchs
. -Met with some great folks (special shout out to Josh Elsner and Adam Fuchs for their time and patience answering questions). -Can’t wait for next year’s summit. Any idea when the slides for the presentations will be available? Thanks, Mike G. This communication, along

Re: Unexpected aliasing from RFile getTopValue()

2015-04-15 Thread Adam Fuchs
On Wed, Apr 15, 2015 at 10:20 AM, Keith Turner ke...@deenlo.com wrote: Random thought on revamp. Immutable key values with enough primitives to make most operations efficient (avoid constant alloc/copy) might be something to consider for the iterator API So, is this a tradeoff in the

Re: Scans during Compaction

2015-02-23 Thread Adam Fuchs
Dylan, The effect of a major compaction is never seen in queries before the major compaction completes. At the end of the major compaction there is a multi-phase commit which eventually replaces all of the old files with the new file. At that point the major compaction will have completely

Re: Scans during Compaction

2015-02-23 Thread Adam Fuchs
, Adam Fuchs afu...@apache.org wrote: Dylan, The effect of a major compaction is never seen in queries before the major compaction completes. At the end of the major compaction there is a multi-phase commit which eventually replaces all of the old files with the new file. At that point the major

Re: Iterators adding data: IteratorEnvironment.registerSideChannel?

2015-02-16 Thread Adam Fuchs
Dylan, If I recall correctly (which I give about 30% odds), the original purpose of the side channel was to split up things like delete tombstone entries from regular entries so that other iterators sitting on top of a bifurcating iterator wouldn't have to handle the special tombstone

Re: Iterators adding data: IteratorEnvironment.registerSideChannel?

2015-02-16 Thread Adam Fuchs
of adding another data stream as a top-level source, but Fig. B is possible too. Regards, Dylan Hutchison On Mon, Feb 16, 2015 at 11:34 AM, Adam Fuchs scubafu...@gmail.com wrote: Dylan, If I recall correctly (which I give about 30% odds), the original purpose of the side channel was to split

Re: Keys with identical timestamps

2015-02-09 Thread Adam Fuchs
Hi Dave, As long as your combiner is associative and commutative both of the values should be represented in the combined result. The non-determinism is really around ordering, which generally doesn't matter for a combiner. Adam On Mon, Feb 9, 2015 at 3:49 PM, Dave Hardcastle

Re: hdfs cpu usage

2015-02-09 Thread Adam Fuchs
Ara, What kind of query load are you generating within your batch scanners? Are you using an iterator that seeks around a lot? Are you grabbing many small batches (only a few keys per range) from the batch scanner? As a wild guess, this could be the result of lots of seeks with a low cache hit

Re: Seeking Iterator

2015-01-12 Thread Adam Fuchs
On Mon, Jan 12, 2015 at 4:10 PM, Josh Elser josh.el...@gmail.com wrote: seek()'ing doesn't always imply an increase in performance -- remember that RFiles (the files that back Accumulo tables), are composed of multiple blocks/sections with an index of them. A seek is comprised of using that

Re: Accumulo available in Fedora 21

2014-12-15 Thread Adam Fuchs
Neato! Adam On Mon, Dec 15, 2014 at 3:25 PM, Christopher ctubb...@apache.org wrote: Accumulators, Fedora Linux now ships with Accumulo 1.6 packaged and available in its yum repositories, as of Fedora 21. Simply run yum install accumulo to get started. You can also just install

Re: comparing different rfile densities

2014-11-11 Thread Adam Fuchs
Jeff, Density is an interesting measure here, because RFiles are going to be sorted such that, even when the file is split between tablets, a read of the file is going to be (mostly) a sequential scan. I think instead you might want to look at a few other metrics: network overhead, name node

Re: Remotely Accumulo

2014-10-06 Thread Adam Fuchs
Accumulo tservers typically listen on a single interface. If you have a server with multiple interfaces (e.g. loopback and eth0), you might have a problem in which the tablet servers are not listening on externally reachable interfaces. Tablet servers will list the interfaces that they are

Re: Compaction slowing queries

2014-09-11 Thread Adam Fuchs
Paul, Here are a few suggestions: 1. Reduce the number of concurrent compaction threads (tserver.compaction.major.concurrent.max, and tserver.compaction.minor.concurrent.max). You probably want to lean towards twice as many major compaction threads as minor, but that somewhat depends on how

Re: Compaction slowing queries

2014-09-11 Thread Adam Fuchs
You can change compression codecs at any time on a per-table basis. This only affects how new files are written. Existing files will still be read the same way. See the table.file.compress.type parameter. One caveat is that you need to make sure your codec is supported before switching to it or

Re: Advice on increasing ingest rate

2014-04-09 Thread Adam Fuchs
, 2014 4:42 PM, Mike Hugo m...@piragua.com wrote: On Tue, Apr 8, 2014 at 4:35 PM, Adam Fuchs afu...@apache.org wrote: MIke, What version of Accumulo are you using, how many tablets do you have, and how many threads are you using for minor and major compaction pools? Also, how big are the keys

Re: HDFS caching w/ Accumulo?

2014-02-26 Thread Adam Fuchs
Maybe this could be used to speed up WAL recovery for use cases that demand really high availability and low latency? Adam On Feb 25, 2014 10:50 AM, Donald Miner dmi...@clearedgeit.com wrote: HDFS caching is part of the new Hadoop 2.3 release. From what I understand, it allows you to mark

Re: WAL - rate limiting factor x4.67

2013-12-04 Thread Adam Fuchs
One thing you can do is reduce the replication factor for the WAL. We have found that makes a pretty significant different in write performance. That can be modified with the tserver.wal.replication property. Setting it to 2 instead of the default (probably 3) should give you some performance

Re: Efficient Tablet Merging [SEC=UNOFFICIAL]

2013-10-03 Thread Adam Fuchs
Never underestimate the power of ascii art! Adam On Oct 2, 2013 11:28 PM, Eric Newton eric.new...@gmail.com wrote: I'll use ASCII graphics to demonstrate the size of a tablet. Small: [] Medium: [ ] Large: [ ] Think of it like this... if you are running age-off... you probably have lots

Re: My Accumulo 1.5.0 instance has no tablet servers

2013-10-01 Thread Adam Fuchs
To follow up on this, I think maybe the config should be namedfs.datanode.synconclosename, not namedfs.data.synconclosename. Was that a typo, Eric? Thanks, Adam On Thu, Sep 12, 2013 at 2:31 PM, Eric Newton eric.new...@gmail.com wrote: Add: property namedfs.support.append/name

Re: Trouble with IntersectingIterator

2013-10-01 Thread Adam Fuchs
Heath, In your case, the question that you are effectively asking is within each partition, which documents' index entries include all of the given terms. Since you have partitions aligned by field and only a single index entry per field you will not get any matches for queries with more than one

RE: Assigned and hosted Error [SEC=UNOFFICIAL]

2013-09-30 Thread Adam Fuchs
Matt, Did you include any patches that have not been committed to the 1.5 branch in your snapshot? Adam On Sep 30, 2013 6:25 PM, Dickson, Matt MR matt.dick...@defence.gov.au wrote: ** *UNOFFICIAL* 1.5.1-SNAPSHOT from 20/09/13. -- *From:* Sean Busbey

Re: BatchWriter performance on 1.4

2013-09-19 Thread Adam Fuchs
The addMutations method blocks when the client-side buffer fills up, so you may see a lot of time spent in that method due to a bottleneck downstream. There are a number of things you could try to speed that up. Here are a few: 1. Increase the BatchWriter's buffer size. This can smooth out the

Re: Getting the IP Address

2013-08-28 Thread Adam Fuchs
Seems like a question a common and complex as which IP address to listen on would have a fair amount of precedent in open-source projects that we could pull from. Are we reinventing the wheel? Does anyone have an example of an application like ours with the same set of supported platforms that has

Re: master fails to start

2013-05-21 Thread Adam Fuchs
Chris, Did you copy the conf/accumulo.policy.example to conf/accumulo.policy? If so, you may need to make some changes to account for changes to hadoop security. I suspect the problem is that the codebase file:${hadoop.home.dir}/lib/* reference doesn't include your CDH3 libraries. You could

Re: [VOTE] 1.5.0-RC3

2013-05-17 Thread Adam Fuchs
Looks like the src part of the distribution is accumulo-project-1.5.0-src.tar.gz. For the same reasons that we removed the assemble tag form the bin package, shouldn't we remove the project tag from the src package? This also has implications as to whether we can just untar both the bin and src

Re: [VOTE] 1.5.0-RC3

2013-05-17 Thread Adam Fuchs
Thanks for putting up with us picky people, Chris! Adam On May 17, 2013 6:15 PM, Christopher ctubb...@apache.org wrote: So, I've fixed the problem with the src tarball including binaries, and I believe I've satisfied all the concerns regarding the naming conventions. I'm going to go ahead

Re: Accumulo software and processes owner

2013-04-26 Thread Adam Fuchs
Terry, To properly secure you Accumulo install it's important that the shared secret in the Accumulo configs only be shared with the Accumulo processes, so I would recommend using a separate accumulo user. In HDFS you can create the directory that Accumulo writes to (/accumulo by default) and

Re: Suggestions on modeling a composite row key

2013-02-27 Thread Adam Fuchs
At sqrrl, we tend to use a Tuple class that implements ListString (ListByteBuffer would also work), and has conversions to and from ByteBuffer. To encode the tuple into a byte buffer, change all the \1s to \1\2, change all the \0s to \1\1, and put a \0 byte between elements. \1 is used as an

Re: Determining the cause of a tablet server failure

2013-02-27 Thread Adam Fuchs
There are a few primary reasons why your tablet server would die: 1. Lost lock in Zookeeper. If the tablet server and zookeeper can't communicate with each other then the lock will timeout and the tablet server will kill itself. This should show up as several messages in the tserver log. If this

Re: Determining the cause of a tablet server failure

2013-02-27 Thread Adam Fuchs
never did see anything in out log files or .out / .err logs indicating the source of the problem, but the above is my best guess as to what was going on. Thanks again for all the tips and pointers! Mike On Wed, Feb 27, 2013 at 11:24 AM, Adam Fuchs afu...@apache.org wrote: There are a few

Re: NoSuchMethodError: FieldValueMetaData (Conflict between hue-plugins-1.2.0-cdh3u5.har and libthrift-0.6.1.jar)

2013-02-08 Thread Adam Fuchs
Is that related to https://issues.apache.org/jira/browse/ACCUMULO-837? Do you have a stack trace you can share? Adam On Fri, Feb 8, 2013 at 10:34 AM, David Medinets david.medin...@gmail.comwrote: I am running a map-reduce job. As soon as my mapper tried to serialize a Mutation I run into a

Re: infinite number of max.versions?

2013-01-28 Thread Adam Fuchs
Mike, The way to do that is to remove the versioning iterator entirely. Just delete the configuration parameters for that iterator: something like config -t tablename -d table.iterator.scan.vers in the accumulo shell, for each of the six configuration parameters. Adam On Mon, Jan 28, 2013 at

Re: Custom Iterators - behavior when switching tablets

2013-01-23 Thread Adam Fuchs
David, The core challenge here is to be able to continue scans under failure conditions. There are several places where we tear down the iterator tree and rebuild it, including when tablet servers die, when we need to free resources to support concurrency, and a few others. In order to continue a

Re: scripted way to create users

2013-01-18 Thread Adam Fuchs
Using the Java API through JRuby or Jython would be another option. With Jython, that would look something like this: export

Re: Accumulo Junit Concurrency/Latency issues ( Accumulo 1.3 )

2012-11-29 Thread Adam Fuchs
am definitely using the same key to update and retrieve the data. At least update the timestamp to the current time (or old timestamp + 1). -Eric On Thu, Nov 29, 2012 at 10:38 AM, Adam Fuchs afu...@apache.org wrote: Josh, Can you share your junit test code so I can replicate this behavior

Re: [VOTE] accumulo-1.4.2 RC4

2012-11-09 Thread Adam Fuchs
+1 The only problem I have found is that the example policy file is still not included (ACCUMULO-364), but that has been corrected for the next version for real this time. The release notes are slightly wrong in that respect, but I don't think this should delay release. Checked signatures,

Re: Accumulo design questions

2012-11-06 Thread Adam Fuchs
4. In supporting dynamic column families, was there a design trade-off with respect to the original BigTable or current HBase design? What might be a benefit of doing it the other way? One trade-off is that pinning locality groups in memory (i.e. making them ephemeral) would be

Re: Number of partitions for sharded table

2012-10-30 Thread Adam Fuchs
Krishmin, There are a few extremes to keep in mind when choosing a manual partitioning strategy: 1. Parallelism and balance at ingest time. You need to find a happy medium between too few partitions (not enough parallelism) and too many partitions (tablet server resource contention and

Re: [VOTE] accumulo-1.4.2 RC3

2012-10-26 Thread Adam Fuchs
Oops, looks like Eric and I owe donuts. Anyone know how to get vim to automatically add license headers? ;-) Adam On Fri, Oct 26, 2012 at 11:14 AM, Billie Rinaldi bil...@apache.org wrote: -1 These files don't have licenses:

Re: What is the Communication and Time Complexity for Bulk Inserts?

2012-10-24 Thread Adam Fuchs
For the bulk load of one file, shouldn't it be roughly O(log(n) * log(P) * p), where n is the size of the file, P is the total number of tablets (proportional to tablet servers), and p is the number of tablets that get assigned that file? For the BatchWriter case, there's a client-side

Re: Accumulo Between Two Centers (DR - disaster recovery)

2012-09-26 Thread Adam Fuchs
Another way to say this is that cross-data center replication for Accumulo is left to a layer on top of Accumulo (or the application space). Cassandra supports a mode in which you can have a bigger write replication than write quorum, allowing writes to eventually propagate and reads to happen on

Re: bulk ingested table showing zero entries on the monitor page

2012-09-21 Thread Adam Fuchs
John is referring to the streaming ingest, not the bulk ingest. Dave is correct on this one. Basically, we don't count the records when you bulk ingest so that we can get sub-linear runtime on the bulk ingest operation. Adam On Fri, Sep 21, 2012 at 4:22 PM, ameet kini ameetk...@gmail.com wrote:

RE: Running Accumulo straight from Memory

2012-09-12 Thread Adam Fuchs
had wanted. ** ** Matt ** ** ** ** *From:* user-return-1330-MATTHEW.J.MOORE=saic@accumulo.apache.org[mailto: user-return-1330-MATTHEW.J.MOORE=saic@accumulo.apache.org] *On Behalf Of *Adam Fuchs *Sent:* Tuesday, September 11, 2012 5:30 PM *To:* user@accumulo.apache.org

Re: Running Accumulo straight from Memory

2012-09-11 Thread Adam Fuchs
Matthew, I don't know of anyone who has done this, but I believe you could: 1. mount a RAM disk 2. point the hdfs core-site.xml fs.default.name property to file:/// 3. point the accumulo-site.xml instance.dfs.dir property to a directory on the RAM disk 4. disable the WAL for all tables by setting

RE: ColumnQualifierFilter

2012-09-10 Thread Adam Fuchs
fetchColumn is agglomerative, so if you call it multiple times it will fetch multiple columns. Adam On Sep 10, 2012 6:25 PM, bob.thor...@l-3com.com wrote: Billie ** ** That’s what I’m doing at the moment, but I’d like to give the iterator a collection of CF/CQ to filter on. Is that

Re: [receivers.SendSpansViaThrift] ERROR: java.net.ConnectException: Connection refused

2012-09-05 Thread Adam Fuchs
Fred, One tracer is fine, and you can set that to be the same as the master node. You also need to set the username and password for the tracer in accumulo-site.xml if you haven't already. Adam On Sep 5, 2012 1:22 PM, Fred Wolfinger fred.wolfin...@g2-inc.com wrote: Hey Marc, I can't tell you

Re: more questions about IndexedDocIterators

2012-07-16 Thread Adam Fuchs
*SNIP 3. Compressed reverse-timestamp using Unicode tricks? -- I see code in Accumulo like // We're past the index column family, so return a term that will sort // lexicographically last. The last unicode character should suffice

Re: getMasterStats problems

2012-07-15 Thread Adam Fuchs
Jim, The HdfsZooInstance looks for accumulo-site.xml on the classpath to find the directory in HDFS to look for the instance ID. If accumulo-site.xml is not on the classpath then it will default to /accumulo, which is probably different from the directory you are using. accumulo-site.xml also

Re: monitor.Monitor - Unable to contact the garbage collector - connection refused

2012-07-12 Thread Adam Fuchs
Sounds like a good upgrade to me. Could even be done as part of that warning message. Adam On Thu, Jul 12, 2012 at 9:28 PM, David Medinets david.medin...@gmail.comwrote: I am seeing the following output in my monitor_lasho.log file. Would it be possible to display the host and port that is

Re: WholeRowIterator, BatchScanner, and fetchColumnFamily don't play well together?

2012-07-09 Thread Adam Fuchs
John, This was a fun one, but we figured it out. Thanks for providing code -- that helped a lot. The quick workaround is to set the priority of the WholeRowIterator to 21, above the VersioningIterator. Turns out the two iterators are not commutative, so order matters. Solution: when you set up

Re: Recovering Tables from HDFS

2012-07-05 Thread Adam Fuchs
Hi Patrick, The short answer is yes, but there are a few caveats: 1. As you said, information that is sitting in the in-memory map and in the write-ahead log will not be in those files. You can periodically call flush (Connector.getTableOperations().flush(...)) to guarantee that your data has

Re: [VOTE] accumulo-1.3.6 RC1

2012-07-03 Thread Adam Fuchs
+1 Signature looks good Hashes look good Installs and runs well (configured, installed, started, attached with shell, created table, inserted, scanned, flushed, compacted, shutdowned) Adam On Tue, Jul 3, 2012 at 1:57 PM, Eric Newton eric.new...@gmail.com wrote: I've recreated the build

Re: querying for relevant rows

2012-06-29 Thread Adam Fuchs
You can't scan backwards in Accumulo, but you probably don't need to. What you can do instead is use the last timestamp in the range as the key like this: key=2 value= {a.1 b.1 c.2 d.2} key=5 value= {m.3 n.4 o.5} key=7 value={x.6 y.6 z.7} As long as your ranges are

Re: Incorrectly setting TKey causes NPE (to nobody's surprise)

2012-06-26 Thread Adam Fuchs
The tradeoff would be convenience versus complexity in the API. I would lean towards having fewer ways to create a Key. Has this debate played out before? http://www.wikivs.com/wiki/Python_vs_Ruby#Philosophy Adam On Tue, Jun 26, 2012 at 9:17 AM, David Medinets david.medin...@gmail.comwrote:

Re: Incorrectly setting TKey causes NPE (to nobody's surprise)

2012-06-26 Thread Adam Fuchs
, Jun 26, 2012 at 10:20 AM, Adam Fuchs afu...@apache.org wrote: The tradeoff would be convenience versus complexity in the API. I would lean towards having fewer ways to create a Key. Has this debate played out before? http://www.wikivs.com/wiki/Python_vs_Ruby#Philosophy Adam

Re: Can I connect an InputStream to a Mutation value?

2012-06-19 Thread Adam Fuchs
There's also the concern of elements of the document that are too large by themselves. A general purpose streaming solution would include support for any kind of objects passed in, not just XML with small elements. I think the fact that it is an XML document is probably a red herring in this case.

Re: Can Sort Order Be Reversed?

2012-05-31 Thread Adam Fuchs
Nope, we currently only support one sort order. The closest you can come is by using an encoding the flips the sort order. In this case, you would take every byte and subtract it from 255 to get your new row, so: void convert(byte[] row) { for(int i = 0; i row.length; i++) row[i] =

Re: Filtering rows by presence of keys

2012-05-25 Thread Adam Fuchs
One of the differences you'll see between WholeRowIterator and RowFilter is that WholeRowIterator buffers an entire row in memory while RowFilter does not. Each includes a boolean method that you would override in a subclass -- acceptRow(...) in RowFilter or filter(...) in WholeRowIterator. In

Re: ROW ID Iterator - sanity check

2012-05-19 Thread Adam Fuchs
One issue here is you are mixing Iterator and Iterable in the same object. Usually, an Iterable will return an iterator at the beginning of some logical sequence, but your iterable returns the same iterator object over and over again. This state sharing would make it so that you can really only

Re: recursive rollup

2012-04-12 Thread Adam Fuchs
Small correction: the branching factor would not have to be exactly 1, but it would be small on average (close to 1). Adam On Thu, Apr 12, 2012 at 12:50 PM, Adam Fuchs adam.p.fu...@ugov.gov wrote: This probably won't work, unless all node names are unique at a given level. For example, given

Re: Newbie Install/Setup Questions

2012-04-10 Thread Adam Fuchs
Sam, Yes, Accumulo 1.4.0 should be compatible with Hadoop 1.0.1 after you remove that check. We've run with it some, but mostly we've tested with 0.20.x. Please let us know if you see any compatibility problems. There are two possibilities for why your second tablet server did not start. Either