I am using HDInsight, and can issue table commands to my MongoDB system, but
its not acting like i think it should.I realize that when i create an external
table in hive, and give it mongodb properties, it is a shell of a table within
hive, and that the real table resides in MongoDB. If I drop the hive table, it
drops the shell, as it were, leaving the table and data in mongo alone.I have
read that I can truncate said data in MongoDB from HIVE, but have never gotten
this to work.I am using the following JARS to interact with Mongo from
HDInsight via HIVE HQL Queries. ADD JAR
wasb://[email protected]/mongo-java-driver-2.13.0.jar;
ADD JAR
wasb://[email protected]/mongo-hadoop-core-1.3.2.jar; ADD
JAR
wasb://[email protected]/mongo-hadoop-hive-1.3.2.jar; I
can issue a create if not exists command on a table and give it a schema. If
the table already exists in mongo, then all is wellIf the table, sorry,
collection, does not exist, then i get no error from hive system when i create
the table. But if i query the table in any way, i get errors.So i have taken to
issuing a create if not exists on the table from hive, then i drop the table,
which removes the mongo table and the hive table, then i recreate the table in
hive.The table doesnt actually exist in mongo yet, until data populates the
collection.
So, to ensure that my collections get unfilled and filled, i now ensure that en
empty collection exists in mongo, then issue a create statement from hive, then
drop, then create, then fill with data.And i can run this over and over and be
fine. As long as the collection exists in mongo, it will continue to empty and
fill with new data when process is ran.
What am i doing wrong.
------Some code
$MongoTable = "StudentUsage" switch ($Environment) { 'Test' { $TBLPROPERTIES =
"TBLPROPERTIES( 'mongo.uri' =
'mongodb://appAdmin:[email protected]:27017/test_reports.$($MongoTable)?authsource=admin&replicaSet=RS-Test-0'
);" } 'Prod' { $TBLPROPERTIES = "TBLPROPERTIES( 'mongo.uri' =
'mongodb://appAdmin:[email protected]:27017/production_reports.$($MongoTable)?authsource=admin&replicaSet=RS-Production-0');"
} 'RC' { $TBLPROPERTIES = "TBLPROPERTIES( 'mongo.uri' =
'mongodb://appAdmin:[email protected]:27017/rc_reports.$($MongoTable)?authsource=admin');"
} 'Demo' { $TBLPROPERTIES = "TBLPROPERTIES( 'mongo.uri' =
'mongodb://appAdmin:[email protected]:27017/demo_reports.$($MongoTable)?authsource=admin');"
} } #The location of the flattened json documents that contain student data
$DocTable = $Environment + "Docs" $JSonLocation =
"wasb://[email protected]/StudentData/$DocTable"; #Hive table that
will pull out Activity Counted data about students $DocsActivityCounted =
$Environment + "DocsActivityCounted"; #Hive table that will have properties
indicating it is a Mongodb table $StudentUsage = $Environment + "StudentUsage";
$StudentUsageH = $Environment + "StudentUsageHive";
$Query = "
ADD JAR
wasb://[email protected]/json-serde-1.3-jar-with-dependencies.jar;
ADD JAR
wasb://[email protected]/mongo-java-driver-2.13.0.jar;
ADD JAR
wasb://[email protected]/mongo-hadoop-core-1.3.2.jar; ADD
JAR wasb://[email protected]/mongo-hadoop-hive-1.3.2.jar;
--This will create a HIVE table that has properties that make it a Mongo table.
create table if not exists $StudentUsage ( studentid string, collectedyear
int, collectedweek int, elapsedseconds float ) stored by
'com.mongodb.hadoop.hive.MongoStorageHandler' $TBLPROPERTIES select count(*)
from $StudentUsage; --But we need to drop it after it was created, so that the
mongo table actually is dropped. drop table if exists $StudentUsage; --Then we
need to create it again, so that it is empty and ready to be filled by the
insertoverwrite, which only fills, not overwrites. create table if not exists
$StudentUsage ( studentid string, collectedyear int, collectedweek int,
elapsedseconds float ) stored by 'com.mongodb.hadoop.hive.MongoStorageHandler'
$TBLPROPERTIES insert overwrite table $StudentUsage select studentid,
from_unixtime(unix_timestamp(datetime, 'yyyy-MM-dd'),'Y') as year,
from_unixtime(unix_timestamp(datetime, 'yyyy-MM-dd'),'w') as weekofyear,
sum(elapsedseconds) from $StudentUsageH GROUP BY studentid,
from_unixtime(unix_timestamp(datetime, 'yyyy-MM-dd'),'Y'),
from_unixtime(unix_timestamp(datetime, 'yyyy-MM-dd'),'w') order by year,
weekofyear, studentid
TJay