Bertrand,

 

Sorry, I don't have a link to msck documentation. I haven't tried it myself,
I just heard of it.

 

Thanks

 

From: Bertrand Dechoux [mailto:[email protected]] 
Sent: Wednesday, July 25, 2012 1:23 PM
To: [email protected]
Subject: Re: Continuous log analysis requires 'dynamic' partitions, is that
possible?

 

usage of msck :
msck table <table>
msck repair table <table>

BUT that won't help me.

I am using an external table with 'external' partitions (which do not follow
hive conventions).
So I first create an external table without local and then I specify every
partition with an absolute location.
I don't think there is another way given my constraints. But if there is, I
will gladly read it.

So, with the current implementation and with regards to the parameters that
can be used with the current hive commands :
1) hive has no way to list table directory
2) hive has no way to understand which variable should be used for each
partitioning level

Conclusion : the only solution at the moment is to declare partitions for
hive in advance (thanks Edward). But that means that I do have to handle the
'synchronisation' of two 'pseudo' file tree : hdfs and hive partitions.

Bertrand

On Wed, Jul 25, 2012 at 10:51 AM, Bertrand Dechoux <[email protected]>
wrote:

@Puneet Khatod : I found that out. And that's why I am asking here. I guess
non AWS users might have the same problems and a way to solve it.

@Ruslan Al-fakikh : It seems great. Is there any documentation for msck? I
will find out with the diff file but is there a wiki page or a blog post
about it? It would be best. I could not find any.

@Edward Capriolo : I now feel silly. This is clearly a better approach that
my proposed hacks. The performance impact should be negligible, even more
when ensuring partition pruning. I am using hive to 'piggy back' on an
external way of writing data. So in my case, I could indeed tell in advance
to hive where the data will be written. (Same as you say but the logic is
reverse.) I guess I skipped over alter table touch. But it would not help
me. The partitions are external. And if I add partitions, I will do it with
cron and a shell file.

Bertrand





On Tue, Jul 24, 2012 at 7:24 PM, Edward Capriolo <[email protected]>
wrote:

Alter table touch will create partitions even if they have no data,
You can also just create partitions ahead of time and have your code
"know" where to write data.



On Tue, Jul 24, 2012 at 12:35 PM, Ruslan Al-fakikh
<[email protected]> wrote:
> If you are not using Amazon take a look at this:
>
> https://issues.apache.org/jira/browse/HIVE-874
>
>
>
> Ruslan
>
>
>
> From: Puneet Khatod [mailto:[email protected]]
> Sent: Tuesday, July 24, 2012 8:32 PM
> To: [email protected]
> Subject: RE: Continuous log analysis requires 'dynamic' partitions, is
that
> possible?
>
>
>
> If you are using Amazon (AWS), you can use 'recover partitions' to enable
> all top level partitions.
>
> This will add required dynamicity.
>
>
>
> Regards,
>
> Puneet Khatod
>
>
>
> From: Bertrand Dechoux [mailto:[email protected]]
> Sent: 24 July 2012 21:15
> To: [email protected]
> Subject: Continuous log analysis requires 'dynamic' partitions, is that
> possible?
>
>
>
> Hi,
>
> Let's say logs are stored inside hdfs using the following file tree
> /<logtype>/<month>/<day>.
> So for apache, that would be :
> /apache/01/01
> /apache/01/02
> ...
> /apache/02/01
> ...
>
> I would like to know how to define a table for this information. I found
out
> that the table should be external and should be using partitions.
> However, I did not found any way to dynamically create the partitions. Is
> there no automatic way to define them?
> In that case, the partition 'template' would be <month>/<day> with the
root
> being apache.
>
> I know how to 'hack a fix' : create a script which would generate all the
> "add partition statement" and run the resulting statements without caring
> about the results because partitions may not exist or may already have
been
> added. Better, I could parse the result of 'show partition' for the table
> and run only the relevant statement but it still feels like a hack.
>
> Is there any clean way to do it?
>
> Regards,
>
> Bertrand Dechoux
>
> Any comments or statements made in this email are not necessarily those of
> Tavant Technologies.
> The information transmitted is intended only for the person or entity to
> which it is addressed and may
> contain confidential and/or privileged material. If you have received this
> in error, please contact the
> sender and delete the material from any computer. All e-mails sent from or
> to Tavant Technologies
> may be subject to our monitoring procedures.





-- 
Bertrand Dechoux




-- 
Bertrand Dechoux

Reply via email to