I think there's only one difference between invocation of an iterator
via scans and major compactions: the batching of Key Values being
returned to the clients. A side effect of this is that after a batch of
data it returned from the server to the client, it's common that a new
instance of the Iterator will be instantiated. You could see if a lot of
instances of your iterator are being created.
Anything unique about the distribution of data? Very large values?
Depending on how you did your timings (at the client or within the
iterator itself), you might have noticed extra time spent in what Thrift
is doing (extra serialization).
If you issued the major compaction through the client API, there is an
boolean option that will wait for the compaction to finish. Otherwise,
compactions are asynchronous.
shweta.agrawal wrote:
On Tuesday 31 March 2015 06:00 PM, shweta.agrawal wrote:
On Monday 30 March 2015 08:03 PM, Josh Elser wrote:
Why are you using a print writer to get output from your iterator?
Just use a logger and look in
$ACCUMULO_HOME/logs/tserver_$hostname.debug.log (or wherever you
configured logging). Create a log4j or slf4j Logger and use that
instead of a print writer. (It's possible that your print writer is
also what is slowing things down)
In most real deployments, iterators should be faster on the server
side than your client because you have N servers performing the work
instead of your one client.
It's not unheard of that a programming error is slowing down your
iterator. Looking at what your iterator does (via logging) should
help. Alternatively, you can use a remote debugger, connect a the
tabletserver, and set breakpoints inside your iterator.
shweta.agrawal wrote:
On Monday 30 March 2015 09:58 AM, shweta.agrawal wrote:
Hi,
Actually i am working on iterator, which i ran on server side by
making jar and also on client side on same data, but on server side
jar which i made is working slow than on client side. I am not able to
find what went wrong. is it possible to work same logic more fast on
client side than on accumulo iterators?
time on client side:8s
time on server side:30s
And to get the output i am writing output on text file through print
writer. To perform my task, i am calling my method on next method and
i am writing output to a file in next method. So actually i want to
know the final method which is called, so that i can write my output
to a file after performing all the task.
Thanks and Regards
Shweta
Hi,
Without print writer also it is taking the same time. And i am trying
to use remote debugger as you suggested but i am facing problem.
To enable remote debugger i changed this in accumulo-env.sh file:
test -z "$ACCUMULO_TSERVER_OPTS" && export
ACCUMULO_TSERVER_OPTS="${POLICY} -Xmx384m -Xms384m -Xdebug
-Xrunjdwp:transport=dt_socket,server=y,address=50095"
But after changing this accumulo is not working. In terminal its
showing started and when i am going to accumulo shell its saying there
are no tablet servers. So please help me out in this. am i doing
something wrong?
Monitor and tserver is not starting their logs are:
Monitor Logs:
2015-03-31 17:36:09,516 [mortbay.log] INFO : Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog
2015-03-31 17:36:09,535 [mortbay.log] INFO : jetty-6.1.26
2015-03-31 17:36:09,607 [mortbay.log] WARN : failed
SocketConnector@shweta:50095: java.net.BindException: Address already
in use
2015-03-31 17:36:09,608 [mortbay.log] WARN : failed Server@6555694:
java.net.BindException: Address already in use
2015-03-31 17:36:09,608 [mortbay.log] INFO : Stopped
SocketConnector@shweta:50095
Tserver Logs:
2015-03-31 17:28:49,206 [tabletserver.TabletServer] INFO : unloaded
!0;~;!0<
2015-03-31 17:28:49,298 [tabletserver.TabletServer] INFO : unloaded !0<;~
2015-03-31 17:28:50,074 [tabletserver.TabletServer] INFO : unloaded
!0;!0<<
2015-03-31 17:28:50,121 [tabletserver.TabletServer] FATAL: Lost tablet
server lock (reason = LOCK_DELETED), exiting.
2015-03-31 17:28:50,122 [tabletserver.TabletServer] INFO : Master
requested tablet server halt
Thanks and Regards
Shweta
Hi,
Thanks for all your help. I got the logs from
$ACCUMULO_HOME/logs/tserver_$hostname.debug.log. Upon analysing them and
setting the iterator to work at Major compaction scope, I found out that
the iterator speeds up and I was able to complete the computation in 887
ms. So now I want to ask that why is there a difference in execution
times when I run the same iterator at major compaction scope and scan
scope? Also is there a way to detect the end of a Major Compaction
programmatically?
Thanks and Regards
Shweta