Re: HDP, Hive + Ignite

Ivan Veselovsky Thu, 27 Apr 2017 08:02:24 -0700

Hi, Alena
 
1. logs you have attached show some errors, but , in fact, I cannot deal
with them until the way to reproduce the problem is known.


2. Here I mean that IGFS (write-through cache built upon another file
system) and the Ignite map-reduce engine (jobtracker on port 11211) are 2
independent things , and can be used independently one of the other. That
means that you can use IGFS without Ignite map-reduce, and Ignite Map-reduce
without IGFS. If you experience some problem using them both, an idea to
track down the issue is to try the same job (1) without Ignite at all, (2)
with IGFS , but without Ignite map-reduce, (3) without IGFS, but with Ignite
job-tracker. This may help to understand, what subsystem causes the problem.
Similar approach can be used when trying to speed up a task.

3. This option should be a property of Hadoop job, so it can either be set
in global Hadoop configuration, or set for concrete hadoop job, e.g. 
jar
./hadoop-2.7.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar pi
-Dignite.job.shared.classloader=false 5 10
Notice that here the -Dkey=value pairs must be placed between the program
name (pi) and its arguments (5, 10).
In case of Hive similar properties can be passed in using --hiveconf hive
client option, like "hive ... --hiveconf ignite.job.shared.classloader=false
... " .

4. You see OOME (OutOfMemoryError) related to insufficient heap. Heap size
is set via JVM -Xms (initial) and -Xmx (maximum) options. Assumed way to
configure those options for Ignite is to use -J prefix : anything following
-J will be passed to Ignite JVM, e.g. command "./ignite.sh -v -J-Xms4g
-Xmx4g" gives Ignite JVM 4g of initial and 4g of max heap.
Off-heap memory parameters are managed differently, in the Ignite's xml
config.

5. It is very problematic to provide all possible values , since one
configuration value may be set to many different value beans , and each of
them has its own properties. E.g.
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi#ipFinder property (in
the end of your default-config.xml) sets value
"org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder"
there, but there are also ~10 other impelmentations of
org.apache.ignite.spi.discovery.tcp.ipfinder.TcpDiscoveryIpFinder , and they
have different properties. 
Moreover, one can create own implementation with some ad-hoc properties.
So , the file "with all possible values" is really hard or impossible to
create.
To print the default config with all the values explicitly shown -- this is
doable, you can submit it as a feature-request.

6. IGFS can be used as a standalone file system (aka PRIMARY mode), and as a
caching layer on top of another file system (DUAL_SYNC, DUAL_ASYNC modes).
Regarding high availability: IGFS is not highly available . In case of a
dual mode it will re-read data from underlying file system layer on failure,
in primary mode a data loss is possible. Problem with starting HDFS is only
with global configs seen by HDFS daemons -- they should not specify IGFS as
a default file system.

7. Hive query will be transformed to a number of map-reduce tasks, each task
will run on Ignite execution engine. Ignite map-reduce is designed in such a
way that it does not "spill" intermediate data between map and reduce -- it
stores all them in memory (mainly offheap, and this is not caches offheap
you configure in default-config.xml). It is difficult to give exact answer
to yopur question in theory, exact limit can better be found experimentally.
Very rough estimation is that 1.5 -
 doubled file data size (uncompressed, 9G) should fit in memory on all
nodes. Note that offheap memory used by Ignite map-reduce is not limited in
configuration.   

8. You can monitor resources utilized by Ignite using
(1) its own diagnostic printouts (appear periodically in its console output,
when ran with "-v" option).
(2) using any Java monitoring tool, such as JConsole, VisualVM,
JavaFlightRecorder, etc.
But, what is important, offheap memory used in map-reduce will not be shown
in any of the tools listed above. It can be seen only as total amount of
memory used by the Ignite java process -- you can use a native (your OS
specific) process monitoring tool for that . 



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/HDP-Hive-Ignite-tp12195p12296.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: HDP, Hive + Ignite

Reply via email to