Contacted Hive User Group as well on this matter.
They also mentioned that this approach is not possible.
Also as per the chat I had with Buddhika, right now, these kind of dynamic
variable creations is not possible in Hive that comes with BAM2.
Therefore IMO, without going ahead with this
So through this custom java task, what is the scale of log processing you
will support? 100MB, 1 GB, 100 GB, 1 TB?
On Mon, Jul 23, 2012 at 5:14 PM, Manisha Gayathri mani...@wso2.com wrote:
Contacted Hive User Group as well on this matter.
They also mentioned that this approach is not possible.
The requirement is simple. We need to generate log files on a per tenant,
per date, per Service basis. Now as a big data analytics expert, please
advise us on what is the best solution for this.
Azeez
On Mon, Jul 23, 2012 at 6:05 PM, Tharindu Mathew thari...@wso2.com wrote:
So through this
I'm no expert, but I immediately question the scale of this approach.
Do you have an idea of how much of logs you plan to process per task?
On Mon, Jul 23, 2012 at 6:13 PM, Afkham Azeez az...@wso2.com wrote:
The requirement is simple. We need to generate log files on a per tenant,
per date,
Like you said, the task may not be the best way to do this. Like we
discussed the other day, we can publish logs to unique column families
which contain the Service_Tenant_Date as the unique identifier. We
need to generate logs in a file format allow tenant users to download
those. What is the
insert select * from foo
On Mon, Jul 23, 2012 at 7:15 PM, Afkham Azeez az...@wso2.com wrote:
On Mon, Jul 23, 2012 at 6:41 PM, Tharindu Mathew thari...@wso2.comwrote:
If you are planning to do a few MB, that would mean that the size of logs
will be ( size of logs * no. of tenants ), so
On Mon, Jul 23, 2012 at 6:41 PM, Tharindu Mathew thari...@wso2.com wrote:
If you are planning to do a few MB, that would mean that the size of logs
will be ( size of logs * no. of tenants ), so roughly for 200 active
tenants and 2 MB of logs, it would come to around 400 MB. This is still
Hi All,
It is not that it is impossible to inject runtime variables (bit like query
parameters in DSS) to Hive query execution it might take some modifications
from the hive side to make it possible in order to do it programatically.
Currently I am doing some work in Hive for making it tenant
On Mon, Jul 23, 2012 at 7:21 PM, Tharindu Mathew thari...@wso2.com wrote:
insert select * from foo
File names should be dynamically generated according to the tenant id as
well as the service name (ie T1_P1_T1.gz) and also when selecting data we
need to retrieve column families dynamically
So if I understand right the data are stored in seperate column families
per each tenant,server,day and the requirement is to transfer these column
family data directly to a flat file which corresponds to a logs from a
tenant for a server in a given day with no analytics involved. If it is the
If I give a more clear picture into the scenario;
We have separate column families for each tenant, server and day.
Eg:
log_0_esbserver_2012_07_23
log_1_esbserver_2012_07_23
log_2_esbserver_2012_07_23
log_0_esbserver_2012_07_24
log_2_appserver_2012_07_24
log_3_appserver_2012_07_24 (0,1,2..
Pleaase find my comments in line.
On Mon, Jul 23, 2012 at 8:35 PM, Manisha Gayathri mani...@wso2.com wrote:
If I give a more clear picture into the scenario;
We have separate column families for each tenant, server and day.
Eg:
log_0_esbserver_2012_07_23
log_1_esbserver_2012_07_23
Hi Buddhika,
Thanks for the detailed solution. Yes. I don't see any issue in this
approach. That is exactly the hive query that I am using in my script. But
the issue is with the other parts (fetching relevant columns, creating
respective directories etc. )
But in a discussion about the
13 matches
Mail list logo