The iterator in the gist also counts cells/entries/KV pairs, not unique
rows. You'll want to have some way to skip to the next row value if you
want the count to be reflective of the number of rows being read.
On Fri, Jul 15, 2016 at 3:34 PM, Shawn Walker
wrote:
> My
I wonder if the file isn't being decrypted properly. I don't see why it
would write out incompatible file versions.
On Fri, Jul 8, 2016 at 3:02 PM, Josh Elser wrote:
> Interesting! I have not run into this one before.
>
> You could use `accumulo rfile-info`, but I'd guess
You could also shade/relocate dependency classes within the uber/fat jar.
It has pitfalls but it is very easy to set up.
On Thursday, July 7, 2016, Massimilian Mattetti wrote:
> Hi Jim,
>
> the approach of using namespace from HDFS looks promising. I need to
> investigate a
Cool beans, Keith!
On Tue, Jan 19, 2016 at 11:30 AM, Keith Turner wrote:
> The Fluo project is happy to announce a 1.0.0-beta-2[1] release which is
> the
> third release of Fluo and likely the final release before 1.0.0. Many
> improvements in this release were driven by the
Thanks, Jonathan! I've wondered about specific numbers on this topic when
dealing with geohashes, so this is a very useful tool.
On Sun, Oct 25, 2015 at 11:22 AM, Jonathan Wonders
wrote:
> I have been able to put some more thought into this over the weekend and
> make
Soup gave a talk about something down this alley:
https://www.youtube.com/watch?v=aedejUXWrV0
On Thu, Oct 1, 2015 at 2:58 PM, Keith Turner wrote:
> Could possibly use a ThreadLocal containing a SoftReference
>
> Another place you could possibly put this code instead of in a
By Hadoop do you mean a Yarn NodeManager process?
On Mon, Aug 17, 2015 at 4:21 PM, Jeff Kubina jeff.kub...@gmail.com wrote:
On each of the processing nodes in our cluster we have running 1) HDFS
(datanode), 2) Accumulo (tablet server), and 3) Hadoop. Since Accumulo
depends on the HDFS, and
Hi all,
I've had some questions from users regarding setting
`hive.auto.convert.sortmerge.join.noconditionaltask`. I see, in some
documentation from users and vendors, that it is recommended to set this
parameter. In neither Hive 0.12 nor 0.14 can I find in HiveConf where this
is actually defined
You are correct sir!
On Tue, Aug 4, 2015 at 3:42 PM, Josh Elser josh.el...@gmail.com wrote:
Might you have meant to send this to u...@hive.apache.org?
William Slacum wrote:
Hi all,
I've had some questions from users regarding setting
`hive.auto.convert.sortmerge.join.noconditionaltask
Swap out 1.5 in the previous link for the version you're probably using.
Which charts are you looking at for the compactions? Usually it's just the
number of compactions currently running for the system.
On Thu, Jul 30, 2015 at 7:10 PM, William Slacum wsla...@gmail.com wrote:
See
http
See
http://accumulo.apache.org/1.5/apidocs/org/apache/accumulo/core/client/admin/TableOperations.html#flush%28java.lang.String,%20org.apache.hadoop.io.Text,%20org.apache.hadoop.io.Text,%20boolean%29
for minor compacting (aka flushing) a table via the API.
On Thu, Jul 30, 2015 at 5:52 PM, Hai
Look in ConfiguratorBase for how it converts enums to config keys. These
are the two methods that are used:
/**
* Provides a configuration key for a given feature enum, prefixed by the
implementingClass
*
* @param implementingClass
* the class whose name will be used as a
What do you mean by multiple entries? Are you doing something similar to
the WholeRowIterator, which encodes all the entries for a given row into a
single key value?
Are you using any other iterators?
In general, calls to `hasTop()`, `getTopKey()` and `getTopValue()` should
not change the state
What does your code look like?
I've seen issues where I have some code of the form:
BatchScanner s = connector.createBatchScanner(...);
for(Entry e : s) { System.out.println(e); }
This usually results in an InterruptedException because the
TabletServerBatchReaderIterator doesn't seem to have a
woohoo
Look forward to getting to use this!
On Thu, Oct 9, 2014 at 4:54 PM, Corey Nolet cjno...@gmail.com wrote:
The Fluo project is happy to announce the 1.0.0-alpha-1 release of Fluo.
Fluo is a transaction layer that enables incremental processing on top of
Accumulo. It integrates into
This comes up a bit, so maybe we should add it to the FAQ (or just have
better information about iterators in general). The short answer is that
it's usually not recommended, because there aren't strong guarantees about
the lifetime of an iterator (so we wouldn't know when to close any
resources
Going through the proxy will always be an extra RPC step over using a Java
client. Eliminating that step, I think, would net the most benefit.
On Mon, Aug 11, 2014 at 12:16 AM, John R. Frank j...@diffeo.com wrote:
Josh,
Following up on this earlier post about the proxy:
I have seen issues if I don't have an explicit close on the batch scanner.
When I don't have the close, the gc ends up calling `finalize()` which
closes the thread pool. Basically, the work around is to manage the
lifetime of the instance yourself, rather than leave it up to fate.
On Sun, Aug 3,
Quick google search yielded:
https://github.com/GeoLatte/geolatte-geom/blob/master/src/main/java/org/geolatte/geom/curve/MortonCode.java
On Thu, Jul 24, 2014 at 10:10 AM, THORMAN, ROBERT D rt2...@att.com wrote:
Can anyone share a Java method to convert lat/lon (decimal degrees) to
Z-Curve
[tserver.TabletServer] DEBUG: ScanSess tid
10.0.2.15:44992 8 *0 entries* in 0.01 secs, nbTimes = [6 6 6.00 1]
No exceptions otherwise. Really appreciate all the ongoing help.
Best,
-Mike
On Mon, Jul 14, 2014 at 6:40 PM, William Slacum
wilhelm.von.cl...@accumulo.net wrote:
Anything in your Tserver
Hi Mike!
The Combiner interface is only for aggregating keys within a single row.
You can probably get away with implementing your combining logic in a
WrappingIterator that reads across all the rows in a given tablet.
To do some combine/fold/reduce operation, Accumulo needs the input type to
be
For a bit of psuedocode, I'd probably make a class that did something akin
to: http://pastebin.com/pKqAeeCR
I wrote that up real quick in a text editor-- it won't compile or anything,
but should point you in the right direction.
On Mon, Jul 14, 2014 at 3:44 PM, William Slacum
wilhelm.von.cl
scan*
*root@dev pojo*
Best,
-Mike
On Mon, Jul 14, 2014 at 4:07 PM, William Slacum
wilhelm.von.cl...@accumulo.net wrote:
For a bit of psuedocode, I'd probably make a class that did something
akin to: http://pastebin.com/pKqAeeCR
I wrote that up real quick in a text editor-- it won't
thoughts? What's the best way to debug these?
On Mon, Jul 14, 2014 at 5:14 PM, William Slacum
wilhelm.von.cl...@accumulo.net mailto:wilhelm.von.cl...@accumulo.net
wrote:
Ah, an artifact of me just willy nilly writing an iterator :) Any
reference to `this.source` should be replaced
If the zookeeper data is gone, your best bet is try and identify which
directories under /accumulo/tables points to which tables you had. You can
then bulk import the files into a new instance's tables.
On Sun, Jul 13, 2014 at 11:54 PM, Vicky Kak vicky@gmail.com wrote:
I am not sure if the
I had a similar thread going on and am currently rummaging through the
batch writer code (as well as pontificating on how the tablet server
handles multiple write clients for the tablet).
What is your ingest skew like? Is it uniform? How quickly do splits occur?
I've seen, at relatively low
I can try to confirm that, but the monitor isn't showing any failures
during ingest. By half dead do you mean the master thinks it is alive,
but in actuality it isn't?
On Fri, Jun 20, 2014 at 10:32 AM, Keith Turner ke...@deenlo.com wrote:
On Thu, Jun 19, 2014 at 11:57 PM, William Slacum
is a byte used for doing an ordering on rows that share the same prefix.
There was a presentation floating around on the specifics of the metadata
table at one point. I believe that helps tablet information sort before the
last tablet, which is suffixed with '~', to force it to sort after the
I think first and foremost, how has writing your application been? Is it
something you can easily onboard other people for? Does it seem stable
enough? If you can answer those questions positively, I think you have a
winning situation.
The big three Hadoop vendors (Cloudera, Hortonworks and MapR)
I'm finding some ingest jobs I have running in a bit of a sticky sitch:
I have a MapReduce job that reads a table, transforms the entries, creates
an inverted index, and writes out mutations to two tables. The cluster size
is in the tens of nodes, and I usually have 32 mappers running.
The batch
How much of this is a standalone utility? I think a magic button approach
would be good for this case.
On Mon, Jun 16, 2014 at 5:24 PM, Sean Busbey bus...@cloudera.com wrote:
In an effort to get more users off of our now unsupported 1.4 release,
should we support upgrading directly to 1.6
Wouldn't the iterator have to be on the classpath for the JVM that launches
the shell command?
On Sun, Jun 15, 2014 at 9:02 AM, Vicky Kak vicky@gmail.com wrote:
setiter -n MyIterator -p 10 -scan -minc -majc -class
com.codebits.d4m.iterator.MyIterator
scan
The above line fails for me
is throwing that Exception?
On Jun 15, 2014 8:50 AM, William Slacum wilhelm.von.cl...@accumulo.net
wrote:
Wouldn't the iterator have to be on the classpath for the JVM that
launches the shell command?
On Sun, Jun 15, 2014 at 9:02 AM, Vicky Kak vicky@gmail.com wrote:
setiter -n
By blocking, we mean you have to complete the entire index look up before
fetching your records.
Conceptually, instead of returning a `CollectionText rows`, return an
`IteratorText rows` and consume them in batches as the first look up
produces them. That way record look ups can occur in parallel
You could save the splits, delete the table, then reapply the splits.
On Mon, May 12, 2014 at 9:23 AM, BlackJack76 justin@gmail.com wrote:
Besides using the tableOperations to deleteRows or delete the table
entirely,
what is the fastest way to delete all data in a table? I am currently
You could do mutations or bulk loading. As long as you can phrase your data
in terms of keys and values, you can store it in Accumulo.
On Tue, Apr 29, 2014 at 1:48 PM, Geoffry Roberts threadedb...@gmail.comwrote:
David started this thread yesterday. Since then I have read everything, I
Our own Keith Turner is trying to make this possible with Accismus (
https://github.com/keith-turner/Accismus). I don't know the current state
of it, but I believe it's still in the early stages.
I've always been under the impression that launching a scanner or writer
from within an iterator, as
Depending on your table schema, you'll probably want to translate an object
graph into multiple mutations.
On Thu, Apr 24, 2014 at 8:40 PM, David Medinets david.medin...@gmail.comwrote:
If the sub-document changes, you'll need to search the values of every
Accumulo entry?
On Thu, Apr 24,
java.io.FileNotFoundException: File does not exist:
bulk/entities_fails/failures
sticks out to me. it looks like a relative path. where does that directory
exist on your file system?
On Tue, Apr 8, 2014 at 9:40 AM, pdread paul.r...@siginttech.com wrote:
Hi
I interface to an accumulo cloud
The extension is .rf. Are you using an RFile.Writer?
On Tue, Apr 8, 2014 at 1:29 PM, pdread paul.r...@siginttech.com wrote:
Josh
As I had stated in one of my previous posts I am using FileSystem. I am
using the code from the MapReduce bulk ingest without the MapReduce. I did
feed the
Thanks, Joe!
On Fri, Mar 7, 2014 at 2:01 PM, joeferner joe.m.fer...@gmail.com wrote:
Submitted the patch here: ACCUMULO-2439
https://issues.apache.org/jira/browse/ACCUMULO-2439
--
View this message in context:
FWIW you can probably avoid the scan by making your insert idempotent aside
from the timestamp and let versioning handle deduplication.
On Wed, Feb 12, 2014 at 1:19 PM, Ariel Valentin ar...@arielvalentin.comwrote:
Sorry but I am not at liberty to be specific about our business problem.
Filters (and more generally, iterators) are executed on the server. There
is an option to run them client side. See
http://accumulo.apache.org/1.4/apidocs/org/apache/accumulo/core/client/ClientSideIteratorScanner.html
Using fetchColumnFamily will return only keys that have specific column
family
Some data on short circuit reads would be great to have.
I'm unsure of how correct the compaction leading to eventual locality
postulation is. It seems, to me at least, that in the case of a multi-block
file, the file system would eventually try to distribute those blocks
rather than leave them
If an iterator is only set at scan time, then its logic will only be
applied when a client scans the table. The data will persist through major
and minor compaction and be visible if you scanned the RFile(s) backing the
table. Suppress is the better word in this case. Would you please open a
There wasn't any discussions in those tickets as to what Hadoop 2 provides
Accumulo. If we're going to still support 1, then any new features only
possible with 2 have to become optional until we ditch support for 1. Is
there anything people have in mind, feature wise, that Hadoop 2 would help
That iterator is designed to be used with a sharded table format, where in
the index and record each occur within the same row. See the Accumulo
examples page http://accumulo.apache.org/1.4/examples/shard.html
On Tue, Oct 1, 2013 at 3:35 PM, Heath Abelson habel...@netcentricinc.comwrote:
I am
Usually the intersecting iterator is used when you're modeling a document
partitioned table. That is, you have relatively few row values compared to
the number of documents you're storing (like, on the order of hundreds to
millions of documents in a single row). It looks like you have a single row
Finding the keys after your hypothetical key is easy, as you can just make
it the first key in the range you pass to your Scanner. Since accumulo
doesn't do backwards scanning, you might have to consider having two tables
or sets of rows, one that sorts lexicographically and the other that sorts
There can also be significant overhead in starting a MR job if you're using
`-libjars` for distributing your dependencies. This effect is more
pronounced as the number of nodes increases. I would recommend looking
into the distributed cache (there's a quick description at
We could also just add a transformation from HFileReader -
LocalityGroupReader, since I think HBase's storage model (forgive me if
there's a better term) maps pretty well to that.
On Tue, Jul 9, 2013 at 2:20 PM, dlmar...@comcast.net wrote:
I believe that Brian Loss committed code in 1.5 for a
There's an almost identical method that, instead of a CharSequence or
byte[], takes an AuthorizationToken object. If you're using user/password,
use a PasswordToken (I think that's the name of the object).
On Thu, May 30, 2013 at 4:00 PM, Newman, Elise
enew...@integrity-apps.comwrote:
Okay, I
According to https://issues.apache.org/jira/browse/HADOOP-7823 , it should
possible to split bzip2 files in Hadoop 1.1.
On Tue, May 21, 2013 at 3:54 PM, Eric Newton eric.new...@gmail.com wrote:
The files decompress remarkably fast, too. I seem to recall about 8
minutes on our hardware.
I
I was always under the impression there was a check, presumably on the
client side, that would end a scan session if a key was returned that was
not in the original scan range.
Say I scanned my table for the range [A, B], but I had an iterator that
returned only keys beginning with C. I would
Sorry guys, I forgot add some methods to the iterator to make it work.
http://pastebin.com/pXR5veP6
On Wed, May 1, 2013 at 8:01 PM, William Slacum
wilhelm.von.cl...@accumulo.net wrote:
I was always under the impression there was a check, presumably on the
client side, that would end a scan
And it uses the `IteratorSetting(int height, Class? iterator)`
constructor, so the name of the iterator is the class itself. Naming your
iterator should be a short term fix. I created ACCUMULO-1267 to make a
smarter input format.
On Thu, Apr 11, 2013 at 2:22 PM, William Slacum
wilhelm.von.cl
The build hangs in cloudtrace for me on Mac OS 10.7.5, oddly enough on a
TSocket creation. I thought it was due to me having Thrift 0.9 installed,
but I can't see it getting picked up when I try to build via `mvn -X...`,
only thrift-0.6.1. Anyone else run into the same thing?
I'm not too worried
As an aside, do we keep track of the ingest and query rates with each
release? I know Josh had a bit of a side project to do it nightly, but it'd
be interesting to check whether or not as the project grows, we aren't
making noticeable trade offs in performance.
On Thu, Mar 14, 2013 at 10:36 AM,
So you want both auto adjusting and not auto adjusting depending on the
size of a range? I suppose you could lift the code for doing the adjusting,
and do some introspection on the ranges (such as how may tablets do I have
in this range?) and apply as necessary.
On Mon, Mar 11, 2013 at 4:47 PM,
On your accumulo master, what do you you in your conf/slaves file?
On Fri, Dec 21, 2012 at 9:43 AM, Kevin Pauli ke...@thepaulis.com wrote:
Hi, I'm trying to get my first Accumulo environment setup to evaluate it.
I've got it running within a CentOS VM, and I've setup the helloworld
data.
Rya is a triple store backed by Accumulo:
http://www.deepdyve.com/lp/association-for-computing-machinery/rya-a-scalable-rdf-triple-store-for-the-clouds-7Xh905FY0y
On Fri, Dec 21, 2012 at 2:01 PM, Keith Turner ke...@deenlo.com wrote:
Take a look at the Typo Lexicoders. A Lexicoder serializes
Did you set ZOOKEEPER_HOME in the accumulo-env.sh script or your
environment?
On Wed, Dec 19, 2012 at 2:03 PM, Kevin Pauli ke...@thepaulis.com wrote:
I'm trying to install Accumulo in CentOS. I have installed the jdk and
hadoop, but can't seem to make Accumulo install happy wrt zookeeper.
I
Nvm you're a step behind where I thought you were at. Turns out I'm of no
help :)
On Wed, Dec 19, 2012 at 2:06 PM, William Slacum
wilhelm.von.cl...@accumulo.net wrote:
Did you set ZOOKEEPER_HOME in the accumulo-env.sh script or your
environment?
On Wed, Dec 19, 2012 at 2:03 PM, Kevin Pauli
'col3' sorts lexicographically before 'col16'. you'll either need to encode
your numerics or zero pad them.
On Thu, Dec 6, 2012 at 9:03 AM, Andrew Catterall
catteralland...@googlemail.com wrote:
Hi,
I am trying to run a bulk ingest to import data into Accumulo but it is
failing at the
That shouldn't be a huge issue. How many rows/partitions do you have? How
many do you have to scan to find the specific column family/doc id you want?
On Fri, Nov 9, 2012 at 11:26 AM, Anthony Fox adfaccu...@gmail.com wrote:
I have a table set up to use the intersecting iterator pattern. The
9, 2012 at 11:39 AM, William Slacum
wilhelm.von.cl...@accumulo.net wrote:
That shouldn't be a huge issue. How many rows/partitions do you have? How
many do you have to scan to find the specific column family/doc id you want?
On Fri, Nov 9, 2012 at 11:26 AM, Anthony Fox adfaccu...@gmail.com
for both index entries and record entries. Could this be the issue? Each
record entry has approximately 30 column qualifiers with data in the value
for each.
On Fri, Nov 9, 2012 at 11:41 AM, William Slacum
wilhelm.von.cl...@accumulo.net wrote:
I guess assuming you have 10M possible
?
On Fri, Nov 9, 2012 at 11:49 AM, William Slacum
wilhelm.von.cl...@accumulo.net wrote:
I'm more inclined to believe it's because you have to search across 10M
different rows to find any given column family, since they're randomly, and
possibly uniformly, distributed. How many tablets
the same for the scan I am doing?
On Fri, Nov 9, 2012 at 12:02 PM, William Slacum
wilhelm.von.cl...@accumulo.net wrote:
So that means you have roughly 312.5k rows per tablet, which means about
725k column families in any given tablet. The intersecting iterator will
work at a row per time, so I
the impression that this would be really fast since I
have a column family bloom filter turned on. Is this not correct?
On Fri, Nov 9, 2012 at 12:15 PM, William Slacum
wilhelm.von.cl...@accumulo.net wrote:
When I said smaller of tablets, I really mean smaller number of rows :)
My apologies.
So
At one point, Keith had warned me against kicking off threads inside a scan
session. Is it possible we could have a discussion on the implications of
this?
On Mon, Nov 5, 2012 at 11:30 AM, Billie Rinaldi bil...@apache.org wrote:
On Mon, Nov 5, 2012 at 11:24 AM, Sukant Hajra
What about the main method that calls ToolRunner.run? If you have 4 jobs
being created, then you're calling run(String[]) or runOneTable() 4 times.
On Fri, Nov 2, 2012 at 5:21 PM, Cornish, Duane C.
duane.corn...@jhuapl.eduwrote:
Thanks for the prompt response John!
When I say that
Make sure that the class is available to the the tserver process. This is
done by putting the jar containing your class on all nodes under the
$ACCUMULO_HOME/lib/ext directory. If you put it under lib/ext, then you
won't need to stop and restart the process for the tserver to pick it up.
On Tue,
-1, since I'm running into the rat issue reported by Dave Medinets when
running build.sh.
On Mon, Oct 22, 2012 at 12:20 PM, Keith Turner ke...@deenlo.com wrote:
On Mon, Oct 22, 2012 at 9:52 AM, Josh Elser josh.el...@gmail.com wrote:
I agree. If it's not a quick fix, we should just revert the
22, 2012 at 10:09 PM, Eric Newton eric.new...@gmail.com wrote:
Can you identify a file that is missing a license or has an incorrect
license?
I have run the build on RHEL 6, and Ubuntu 12.04. In what environment
does the build fail?
-Eric
On Mon, Oct 22, 2012 at 10:06 PM, William Slacum
pass but I don't think
the issue on trunk is related to odp files.
On Mon, Oct 22, 2012 at 10:34 PM, William Slacum
wilhelm.von.cl...@accumulo.net wrote:
I replied to the thread David made, since I think Billie has run into the
same issue. I'm on OSX 10.7.5 and I believe
it's docs/src
If you aren't often looking at the data in the value on the tablet server
(like in an iterator), you can also pre-compress your values on ingest.
On Mon, Oct 1, 2012 at 12:19 PM, Marc Parisi m...@accumulo.net wrote:
You could compress the data in the value, and decompress the data upon
receipt
:
That is exactly my use case (ingest once, serve often, no server-side
iterators).
And I'm doing pre-compression on ingest. I was just looking to do away
with app-level compression code. Not a biggie.
Ameet
On Mon, Oct 1, 2012 at 3:32 PM, William Slacum
wilhelm.von.cl...@accumulo.net
I'm a bit confused as to what you mean if an iterator goes down
mid-processing. If it goes down at all, then whatever scope it's running
in- minor compaction, major compaction and scan- will most likely go down
as well (unless your iterator eats an exception and ignores errors). A
WALog shouldn't
Woops- slow innurnet and didn't notice Eric's response.
On Tue, Sep 11, 2012 at 9:30 AM, William Slacum
wilhelm.von.cl...@accumulo.net wrote:
You could mount a RAM disk and point HDFS to it.
On Tue, Sep 11, 2012 at 9:02 AM, Moore, Matthew J.
matthew.j.mo...@saic.com wrote:
Has anyone
An or clause should be able to handle an enumeration of values, as that's
supported in a JEXL expression. It would not, however, surprise me if those
iterators could not handle multiple rows in a tablet. If you can reproduce
that, please file a ticket. There will be a large update occurring to the
What does your TServer debug log say? Also, are you writing back out to
Accumulo?
To follow up what Jim said, you can check the zookeeper log to see if max
connections is being hit. You may also want to check and see what your max
xceivers is set to for HDFS and check your Accumulo and HDFS logs
Did you configure hadoop to store your HDFS instance/data somewhere
other than /tmp? Look up the single node set up in the Hadoop docs.
On Tue, Jul 17, 2012 at 12:07 PM, Shrestha, Tejen [USA]
shrestha_te...@bah.com wrote:
This is the error that was produced.
java.io.FileNotFoundException: File
Also it looks like your app is storing something in /tmp/files, so you
may want to make sure that you mean to be looking on your local FS or
in HDFS.
On Tue, Jul 17, 2012 at 12:27 PM, William Slacum wsla...@gmail.com wrote:
Did you configure hadoop to store your HDFS instance/data somewhere
1) The class hierarchy is a little convoluted, but there doesn't seem to be
anything necessarily broken about the
FamilyIntersectingIterator/IndexedDocIterator that would prevent it from
being backported from trunk to a 1.3.x branch. AFAIK the
SortedKeyValueIterator interface has remained
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201203.mbox/%3ccaocnvr0osrawytau7lt+agf0bmmcwfhrgpj8_ga4u6mac2y...@mail.gmail.com%3E
It looks like the old API was given a second chance at life and is now
being billed as the stable API.
On Mon, Jul 16, 2012 at 2:39 PM, Billie J
mapred was deprecated as of 0.20.0 (
http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/InputFormat.html)
:)
On Mon, Jul 16, 2012 at 2:49 PM, Juan Moreno
jwellington.mor...@gmail.comwrote:
The hadoop API is very confusing in that regard. Currently Accumulo runs
atop 0.20
?
Would I have to do something as complex as InputFormatBase ? (It's a
mammoth class)
On Jul 16, 2012 5:53 PM, William Slacum wilhelm.von.cl...@accumulo.net
wrote:
mapred was deprecated as of 0.20.0 (
http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/InputFormat.html
I'm on a phone, so excuse the lack of info/answers, but #5 is because the
IntersectingIterator is essentially a proof of concept piece of code.
There's no reason you shouldn't be able to do one term. The Wikipedia
example is able to handle single term queries. The code is a bit rough to
read, but
Looks like the stack trace is finishing up in the Thrift stuff-- I
wonder if you have a newer version of Thrift on the client?
On Sat, Jul 14, 2012 at 10:33 PM, Josh Elser josh.el...@gmail.com wrote:
Can you post some more information about how you're running your program on
your Windows client
It can take a long time if your tablet server isn't responsive, you're
major compacting, or there's some other issue going on in your
ecosystem (ie, the NameNode/DataNode has barfed or even ZooKeeper
itself has locked up). Check your monitor to see what it's trying to
do and also check that HDFS
A tablet will contain at minimum one row. So, if you shard/partition,
eventually your data will grow to the point that each tablet will
essentially be one row.
On Jul 1, 2012 2:17 PM, Sukant Hajra qn2b6c2...@snkmail.com wrote:
I've been considering using distributed messaging service (Akka in my
By iterator stack I am referring to the Accumulo iterators. Resource
sharing among scan sessions is implemented by destroying a user scan
session and eventually recreating the iterator stack. The new stack is then
seek'd to the last key returned by the entire stack. If you were holding
some state,
The you can think of the Intersecting (and Or) iterator as a tree of
merging keys.
So, let's assume we have the following index in a given partition. The
partition will have the row partitionN.
partitionN Bill: 1
partitionN Bill: 2
partitionN Bill: 3
partitionN Josh: 3
partitionN Josh: 4
You can use a BatchScanner and give it two ranges. It would look something
like:
ArrayListRange ranges = new ArrayListRange();
ranges.add(new Range(new Key(timestamp1)));
ranges.add(new Range(new Key(timestamp2)));
BatchScanner bs = con.createBatchScanner(...);
//set your iterators and filters
Oh, did I interpret this wrong? I originally thought all of the timestamps
would be enumerated as rows, but after re-reading, I kind of get the idea
that the rows are being used as markers in a skip list like fashion.
On Fri, Jun 29, 2012 at 11:52 AM, Adam Fuchs afu...@apache.org wrote:
You
You're pretty much on the spot regarding two aspects about the current
IntersectingIterator:
1- It's not really extensible (there are hooks for building doc IDs,
but you still need the same `partition term: docId` key structure)
2- Its main strength is that it can do the merges of sorted lists of
Did your NameNode start up correctly?
If on a local instance, you can verify this by running `jps -lm`. If
jps isn't on your path, it should be located in $JAVA_HOME/bin.
If the NameNode is not running, check your Hadoop logs. The log you
want should have namenode in the file name-- it should
So, is a global sorting order required of your iterator? That's really
the key behavioral difference in terms of output when you're dealing
with a Scanner versus a BatchScanner.
Please correct me if I'm wrong about assuming you're trying to get a
distribution for the column families that appear
You're kind of there. Essentially, you can think of your Scanner's
interactions with the TServers as a tree with a height of two. Your
Scanner is the root and its children are all of the TServers it
needs to interact with. Essentially, the operation you'd want to is
sum the number of records each
1 - 100 of 104 matches
Mail list logo