To enable remote debugging, in ACCUMULO_TSERVER_OPTS in accumulo-env.sh,
add the following "-Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8888"
In this case, you would then use the port 8888 in Eclipse to do a Remote
Java Application debugging session. Your TServer would need to be
running locally to do this. If it's running on a remote host, you could
do some trickery setting up SSH tunnels.
--
One problem with your iterator is that you are not returning your data
in sorted order. This is a very bad idea as it invalidates the contract
of the SortedKeyValueIterator interface and will cause you trouble in
the future.
I'm not certain if this is why you are having problems with the
BatchScanner -- I would have thought this would be problematic in both
the Scanner and BatchScanner. You may have just found a set of
conditions that this happened to not fail using the Scanner when it
should have failed.
The omitted code in your myFunction() is a little scary too. You do not
want to consume all of the data in the Range at one time as you will
cause the server to run out of memory. SKVIs are meant to be run over
data in your table _without_ keeping all of the data in memory. Think
more of iterators as functions being applied to a stream of Keys and Values.
You can buffer small amounts of data in an iterator in memory (for
example, buffering a row is fairly common), however this also requires
sufficient memory on the tablet server to keep any row in memory. e.g.
if you have a row that has 100k key-values in it, you will run out of
memory.
madhvi wrote:
Thanks Josh.
Outline of my code is:
public class TestIterator extends WrappingIterator {
HashMap<String, Integer> holder = new HashMap<>();
private Iterator<Map.Entry<String, Integer>> entries=null;
private Entry<String, Integer> entry=null;
private Key emitKey;
private Value emitValue;
@Override
public void seek(Range range, Collection<ByteSequence> columnFamilies,
boolean inclusive) throws IOException {
super.seek(range, columnFamilies, inclusive);
myFunction();
}
myFunction()
{
while(super.hasTop())
{
//matched the condition and put values to holder map.
}
entries = holder.entrySet().iterator();//iterate the map holder.
}
@Override
public Key getTopKey() {
return emitKey;
}
@Override
public Value getTopValue() {
return emitValue;
}
@Override
public boolean hasTop() {
return entries.hasNext();
}
@Override
public void next() throws IOException {
try{
entry = entries.next();
//put the keys of map to rowid and values of map to columnqualifier
through emitKey
emitKey = new Key(new Text(entry.getKey()), new Text(), new
Text(String.valueOf(entry.getValue())));
//return 1 in emitValue.
emitValue = new Value("1".getBytes());
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
This code returning result while using scanner and but not in case of
batchscanner.
And how enable remote debugger in accumulo.
Thanks
Madhvi
On Monday 15 June 2015 09:21 PM, Josh Elser wrote:
It's hard to remotely debug an iterator, especially when we don't know
what it's doing. If you can post the code, that would help
tremendously. Instead of dumping values to a text file, you may fare
better by attaching a remote debugger to the TabletServer and setting
a breakpoint on your SKVI.
The only thing I can say is that a Scanner and BatchScanner should
return the same data, but the invocations in the server to fetch that
data are performed differently. It's likely that due to the
differences in the implementations, you uncovered a bug in your iterator.
One common pitfall is incorrectly handling something we refer to as a
"re-seek". Hypothetically, take a query scanning over [0, 9], and we
have one key per number in the range (10 keys).
As the name implies, the BatchScanner fetches batches from a server,
and suppose that after 3 keys, the server-side buffer fills up. Thus,
the client will get keys [0,2]. In the server, the next time you fetch
a batch, a new instance of the iterator will be constructed (via
deepCopy()). Seek() will then be called, but with a new range that
represents the previous data that was already returned. Thus, your
iterator would be seeked with (2,9] instead of [0,9] again.
I can't say whether or not you're actually hitting this case, but it's
a common pitfall that affects devs.
madhvi wrote:
@josh
If after hasTop and getTopKey, seek would have called then this should
also be written in call hierarchy.
Because i have written all the function hierarchy in a file.
so the problem if i have called myFunction() in seek.
And after seek getTopKey and getTopValue then hasTop and next should be
called but what is happening sometime getTopValue is called sometime
not. This is happening when i am reading entries through batchscanner.
getTopValue function is called while scanning through scanner, Applying
same iterator using scanner and batchsacnner, through scanner getting
returned entries but getting no entries returned while using
batchscanner.
So can you please explain.