Hi, In our application Hive is used as a database. i.e. a result set from a select query is consumed outside of hadoop cluster.
The consumption process is not Hadoop friendly as in it is network bound not cpu/disk bound. I'm in a process of converting hive query into pig query to see if it reads better. What I'm stuck at is finding the content of a specific alias dump, from all the other stuff being logged, to be able to trigger further process. STREAM <alias> THROUGH <cmd> seems to be one way to trigger a process, it's just that it seems not suitable for the kind of process we are looking at, because the <cmd> gets run in hadoop cluster. any thought? J
