[Dev] Logging Implementation {was: Re: Any Possibility of defining the Hive output directory programmatically?}

2012-07-23 Thread Manisha Gayathri
Contacted Hive User Group as well on this matter. They also mentioned that this approach is not possible. Also as per the chat I had with Buddhika, right now, these kind of dynamic variable creations is not possible in Hive that comes with BAM2. Therefore IMO, without going ahead with this

Re: [Dev] Logging Implementation {was: Re: Any Possibility of defining the Hive output directory programmatically?}

2012-07-23 Thread Tharindu Mathew
So through this custom java task, what is the scale of log processing you will support? 100MB, 1 GB, 100 GB, 1 TB? On Mon, Jul 23, 2012 at 5:14 PM, Manisha Gayathri mani...@wso2.com wrote: Contacted Hive User Group as well on this matter. They also mentioned that this approach is not possible.

Re: [Dev] Logging Implementation {was: Re: Any Possibility of defining the Hive output directory programmatically?}

2012-07-23 Thread Afkham Azeez
The requirement is simple. We need to generate log files on a per tenant, per date, per Service basis. Now as a big data analytics expert, please advise us on what is the best solution for this. Azeez On Mon, Jul 23, 2012 at 6:05 PM, Tharindu Mathew thari...@wso2.com wrote: So through this

Re: [Dev] Logging Implementation {was: Re: Any Possibility of defining the Hive output directory programmatically?}

2012-07-23 Thread Tharindu Mathew
I'm no expert, but I immediately question the scale of this approach. Do you have an idea of how much of logs you plan to process per task? On Mon, Jul 23, 2012 at 6:13 PM, Afkham Azeez az...@wso2.com wrote: The requirement is simple. We need to generate log files on a per tenant, per date,

Re: [Dev] Logging Implementation {was: Re: Any Possibility of defining the Hive output directory programmatically?}

2012-07-23 Thread Afkham Azeez
Like you said, the task may not be the best way to do this. Like we discussed the other day, we can publish logs to unique column families which contain the Service_Tenant_Date as the unique identifier. We need to generate logs in a file format allow tenant users to download those. What is the

Re: [Dev] Logging Implementation {was: Re: Any Possibility of defining the Hive output directory programmatically?}

2012-07-23 Thread Tharindu Mathew
insert select * from foo On Mon, Jul 23, 2012 at 7:15 PM, Afkham Azeez az...@wso2.com wrote: On Mon, Jul 23, 2012 at 6:41 PM, Tharindu Mathew thari...@wso2.comwrote: If you are planning to do a few MB, that would mean that the size of logs will be ( size of logs * no. of tenants ), so

Re: [Dev] Logging Implementation {was: Re: Any Possibility of defining the Hive output directory programmatically?}

2012-07-23 Thread Afkham Azeez
On Mon, Jul 23, 2012 at 6:41 PM, Tharindu Mathew thari...@wso2.com wrote: If you are planning to do a few MB, that would mean that the size of logs will be ( size of logs * no. of tenants ), so roughly for 200 active tenants and 2 MB of logs, it would come to around 400 MB. This is still

Re: [Dev] Logging Implementation {was: Re: Any Possibility of defining the Hive output directory programmatically?}

2012-07-23 Thread Buddhika Chamith
Hi All, It is not that it is impossible to inject runtime variables (bit like query parameters in DSS) to Hive query execution it might take some modifications from the hive side to make it possible in order to do it programatically. Currently I am doing some work in Hive for making it tenant

Re: [Dev] Logging Implementation {was: Re: Any Possibility of defining the Hive output directory programmatically?}

2012-07-23 Thread Amani Soysa
On Mon, Jul 23, 2012 at 7:21 PM, Tharindu Mathew thari...@wso2.com wrote: insert select * from foo File names should be dynamically generated according to the tenant id as well as the service name (ie T1_P1_T1.gz) and also when selecting data we need to retrieve column families dynamically

Re: [Dev] Logging Implementation {was: Re: Any Possibility of defining the Hive output directory programmatically?}

2012-07-23 Thread Buddhika Chamith
So if I understand right the data are stored in seperate column families per each tenant,server,day and the requirement is to transfer these column family data directly to a flat file which corresponds to a logs from a tenant for a server in a given day with no analytics involved. If it is the

Re: [Dev] Logging Implementation {was: Re: Any Possibility of defining the Hive output directory programmatically?}

2012-07-23 Thread Manisha Gayathri
If I give a more clear picture into the scenario; We have separate column families for each tenant, server and day. Eg: log_0_esbserver_2012_07_23 log_1_esbserver_2012_07_23 log_2_esbserver_2012_07_23 log_0_esbserver_2012_07_24 log_2_appserver_2012_07_24 log_3_appserver_2012_07_24 (0,1,2..

Re: [Dev] Logging Implementation {was: Re: Any Possibility of defining the Hive output directory programmatically?}

2012-07-23 Thread Buddhika Chamith
Pleaase find my comments in line. On Mon, Jul 23, 2012 at 8:35 PM, Manisha Gayathri mani...@wso2.com wrote: If I give a more clear picture into the scenario; We have separate column families for each tenant, server and day. Eg: log_0_esbserver_2012_07_23 log_1_esbserver_2012_07_23

Re: [Dev] Logging Implementation {was: Re: Any Possibility of defining the Hive output directory programmatically?}

2012-07-23 Thread Manisha Gayathri
Hi Buddhika, Thanks for the detailed solution. Yes. I don't see any issue in this approach. That is exactly the hive query that I am using in my script. But the issue is with the other parts (fetching relevant columns, creating respective directories etc. ) But in a discussion about the