Bertrand,
Sorry, I don't have a link to msck documentation. I haven't tried it myself, I just heard of it. Thanks From: Bertrand Dechoux [mailto:[email protected]] Sent: Wednesday, July 25, 2012 1:23 PM To: [email protected] Subject: Re: Continuous log analysis requires 'dynamic' partitions, is that possible? usage of msck : msck table <table> msck repair table <table> BUT that won't help me. I am using an external table with 'external' partitions (which do not follow hive conventions). So I first create an external table without local and then I specify every partition with an absolute location. I don't think there is another way given my constraints. But if there is, I will gladly read it. So, with the current implementation and with regards to the parameters that can be used with the current hive commands : 1) hive has no way to list table directory 2) hive has no way to understand which variable should be used for each partitioning level Conclusion : the only solution at the moment is to declare partitions for hive in advance (thanks Edward). But that means that I do have to handle the 'synchronisation' of two 'pseudo' file tree : hdfs and hive partitions. Bertrand On Wed, Jul 25, 2012 at 10:51 AM, Bertrand Dechoux <[email protected]> wrote: @Puneet Khatod : I found that out. And that's why I am asking here. I guess non AWS users might have the same problems and a way to solve it. @Ruslan Al-fakikh : It seems great. Is there any documentation for msck? I will find out with the diff file but is there a wiki page or a blog post about it? It would be best. I could not find any. @Edward Capriolo : I now feel silly. This is clearly a better approach that my proposed hacks. The performance impact should be negligible, even more when ensuring partition pruning. I am using hive to 'piggy back' on an external way of writing data. So in my case, I could indeed tell in advance to hive where the data will be written. (Same as you say but the logic is reverse.) I guess I skipped over alter table touch. But it would not help me. The partitions are external. And if I add partitions, I will do it with cron and a shell file. Bertrand On Tue, Jul 24, 2012 at 7:24 PM, Edward Capriolo <[email protected]> wrote: Alter table touch will create partitions even if they have no data, You can also just create partitions ahead of time and have your code "know" where to write data. On Tue, Jul 24, 2012 at 12:35 PM, Ruslan Al-fakikh <[email protected]> wrote: > If you are not using Amazon take a look at this: > > https://issues.apache.org/jira/browse/HIVE-874 > > > > Ruslan > > > > From: Puneet Khatod [mailto:[email protected]] > Sent: Tuesday, July 24, 2012 8:32 PM > To: [email protected] > Subject: RE: Continuous log analysis requires 'dynamic' partitions, is that > possible? > > > > If you are using Amazon (AWS), you can use 'recover partitions' to enable > all top level partitions. > > This will add required dynamicity. > > > > Regards, > > Puneet Khatod > > > > From: Bertrand Dechoux [mailto:[email protected]] > Sent: 24 July 2012 21:15 > To: [email protected] > Subject: Continuous log analysis requires 'dynamic' partitions, is that > possible? > > > > Hi, > > Let's say logs are stored inside hdfs using the following file tree > /<logtype>/<month>/<day>. > So for apache, that would be : > /apache/01/01 > /apache/01/02 > ... > /apache/02/01 > ... > > I would like to know how to define a table for this information. I found out > that the table should be external and should be using partitions. > However, I did not found any way to dynamically create the partitions. Is > there no automatic way to define them? > In that case, the partition 'template' would be <month>/<day> with the root > being apache. > > I know how to 'hack a fix' : create a script which would generate all the > "add partition statement" and run the resulting statements without caring > about the results because partitions may not exist or may already have been > added. Better, I could parse the result of 'show partition' for the table > and run only the relevant statement but it still feels like a hack. > > Is there any clean way to do it? > > Regards, > > Bertrand Dechoux > > Any comments or statements made in this email are not necessarily those of > Tavant Technologies. > The information transmitted is intended only for the person or entity to > which it is addressed and may > contain confidential and/or privileged material. If you have received this > in error, please contact the > sender and delete the material from any computer. All e-mails sent from or > to Tavant Technologies > may be subject to our monitoring procedures. -- Bertrand Dechoux -- Bertrand Dechoux
