¬¬'

-----Mensagem original-----
De: Raj Hadoop [mailto:hadoop...@yahoo.com]
Enviada em: quarta-feira, 10 de setembro de 2014 15:42
Para: user@hive.apache.org
Assunto: Re: Remove duplicate records in Hive

Thanks. I will try it.
--------------------------------------------
On Wed, 9/10/14, Nishant Kelkar <nishant....@gmail.com> wrote:

 Subject: Re: Remove duplicate records in Hive
 To: user@hive.apache.org, hadoop...@yahoo.com
 Date: Wednesday, September 10, 2014, 1:59 PM

 Hi
 Raj,
 You can do something
 along these lines:

 SELECT
 cno, sqno, SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date  FROM table GROUP 
BY cno, sqno;  However, you have to make sure your  date format is such that 
sorting it gives you the most  recent date. The best way to do that is to have 
it in
 format: YYYY-MM-DD.
 Hope this helps.
 Best Regards,Nishant
 Kelkar
 On Wed, Sep 10, 2014 at
 10:04 AM, Raj Hadoop <hadoop...@yahoo.com>
 wrote:


 Hi,



 I have a requirement in Hive to remove duplicate records (  they differ only 
by one column i.e a date column) and keep  the latest date record.



 Sample :

 Hive Table :

  d2 is a higher

 cno,sqno,date



 100 1 1-oct-2013

 101 2 1-oct-2013

 100 1 2-oct-2013

 102 2 2-oct-2013





 Output needed:



 100 1 2-oct-2013

 101 2 1-oct-2013

 102 2 2-oct-2013



 I am using Hive 0.11



 Any suggestions please ?



 Regards,

 Raj




________________________________
Esta mensagem pode conter informações confidenciais e somente o indivíduo ou 
entidade a quem foi destinada pode utilizá-la. A transmissão incorreta da 
mensagem não acarreta a perda de sua confidencialidade. Caso esta mensagem 
tenha sido recebida por engano, solicitamos que o fato seja comunicado ao 
remetente e que a mensagem seja eliminada de seu sistema imediatamente. É 
vedado a qualquer pessoa que não seja o destinatário usar, revelar, distribuir 
ou copiar qualquer parte desta mensagem. Ambiente de comunicação sujeito a 
monitoramento.

This message may include confidential information and only the intended 
addresses have the right to use it as is, or any part of it. A wrong 
transmission does not break its confidentiality. If you've received it because 
of a mistake or erroneous transmission, please notify the sender and delete it 
from your system immediately. This communication environment is controlled and 
monitored.

B2W Digital

Reply via email to