Re: HIVE SparkSQL
Hi: I need to count some Game Player Events in the game. Such as : How Many Players stay in the game scene 1--Save the Princess from a Dragon Moneys they have paid in the last 5 min How many players pay money for go through this scene much more esily age distribution of themgender distribution of them How many players have not login the game for 5 days after they go through this game scene T The log file have been pre-format, can be load into the mysql directly: RoleLevelUp|1426251269733|5503232ae4b00f39751f1012|2015-03-14 02:22:46|192.168.1.16|1048630|220|0|2|57|1993| RoleLevelUp|1426251269734|5503232ae4b00f39751f1012|2015-03-14 02:22:52|192.168.1.16|1048630|水奈坤|0|0|3|67|1999| RoleLevelUp|1426251269735|550329f9e4b00f39751f101d|2015-03-14 02:24:57|192.168.1.137|1048631|z12|0|0|41|0|380| RoleLevelUp|1426251269736|5503232ae4b00f39751f1012|2015-03-14 02:39:01|192.168.1.16|1048630|水奈坤|0|0|15|0|2968 Now mysql can't satisfy the analysis needs, we want to use other technical to rebuild all static Systems Thanks Best Regards Yours Meng
Re: HIVE SparkSQL
Hallo, Depending non your needs, search technology, such as SolrCloud or ElasticSearch makes more sense. If you go for the Cassandra solution you can use the lucene text indexer... I am not sure if hive or sparksql are very suitable for text. However, if you do not need text search then feel free to go for them. What kind of statistics / aggregates do.you want to get out of of your logs? Best regards Le 18 mars 2015 04:29, "宫勐" a écrit : > Hi: > >I need to migrate a Log Analysis System from mysql + some C++ real time > computer framwork to Hadoop ecosystem. > >When I want to build a data warehouse. don't know which one is the > right choice. Cassandra? HIVE? Or just SparkSQL ? > > There is few benchmark for these systems. > > My scenario as below: > > Every 5 seconds, flume will translate a log file from IDC. The log > file is pre-format to adapt Mysql Load event。 There is many IDCs,and will > close down OR reconnect to the flume random. > > Every online IDC must receive analyse of their LOG every 5mins > > Any Suggestion? > > Thanks > Yours > Meng >