Re: Fwd: Partition and Split rows

Casey Ching Tue, 17 May 2016 14:15:06 -0700

I’ll talk with Dan, I’m guessing he’s referring to some Kudu timestamp 
features, not sure. I am sure that Impala doesn’t yet support Kudu’s 
timestamps. I just filed an issue for adding this.



On May 17, 2016 at 11:47:30 AM, Amit Adhau (amit.ad...@globant.com) wrote:

Hi Casey,

As per Dan's reply as per below it is supported in kudu 0.8, I'm bit confused 
now, can you please confirm, so that we can use use either int64 in impala if 
not supported or use timestamp(but then it is giving error in 0.8).

Q: Are you going to provide the Kimpala merge or kudu timestamp support in 
Impala/Tableu in near future.


TIMESTAMP typed columns should work in Impala/Kudu as of the 0.8 release (the 
latest), however I know there were a few bugs in the previous releases.  If 
your using the latest and are still seeing errors, please send the error and/or 
file an issue on JIRA.

Thanks,
Amit

On May 18, 2016 12:03 AM, "Casey Ching" <ca...@cloudera.com> wrote:
Hi Amit,

Impala doesn’t yet support Kudu’s timestamps. I think the best solution we have 
for now is to use Unix time style timestamp values. Impala has functions to 
convert to/from ints/timestamps (example CAST can be used).

Casey

On May 17, 2016 at 12:26:18 AM, Amit Adhau (amit.ad...@globant.com) wrote:

Thanks Dan,

In our CDH 5.7 cluster, we are using Kudu parcel "0.8.0-1.kudu0.8.0.p0.11" and 
Impala_Kudu parcel "2.6.0-1.cdh5.8.0.p0.17".
we have a table created in kudu using java code as per below;


CREATE EXTERNAL TABLE `tablename` (
`long_value` BIGINT,
`timestamp_value` TIMESTAMP,
`string_value` STRING,

`addrmetric` STRING
)
TBLPROPERTIES(
  'storage_handler' = 'com.cloudera.kudu.hive.KuduStorageHandler',
  'kudu.table_name' = 'tablename',
  'kudu.master_addresses' = 'cluster:7051',
  'kudu.key_columns' = 'long_value, timestamp_value'
);
But, when we try to create the same table in Impala, we are getting below error;

May 17, 4:00:51.477 AM  INFO    jni-util.cc:177         

com.cloudera.impala.common.ImpalaRuntimeException: Type TIMESTAMP is not 
supported in Kudu
        at com.cloudera.impala.util.KuduUtil.fromImpalaType(KuduUtil.java:251)
        at com.cloudera.impala.util.KuduUtil.compareSchema(KuduUtil.java:67)
        at 
com.cloudera.impala.catalog.delegates.KuduDdlDelegate.createTable(KuduDdlDelegate.java:87)
        at 
com.cloudera.impala.service.CatalogOpExecutor.createTable(CatalogOpExecutor.java:1516)
        at 
com.cloudera.impala.service.CatalogOpExecutor.createTable(CatalogOpExecutor.java:1365)
        at 
com.cloudera.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:249)
        at com.cloudera.impala.service.JniCatalog.execDdl(JniCatalog.java:131)



May 17, 4:00:51.479 AM  INFO    status.cc:112           

ImpalaRuntimeException: Type TIMESTAMP is not supported in Kudu
May 17, 4:00:51.479 AM  ERROR   catalog-server.cc:64            

ImpalaRuntimeException: Type TIMESTAMP is not supported in Kudu

Can you please help us on the same.

Thanks,

Amit
---------- Forwarded message ----------
From: Dan Burkert <d...@cloudera.com>
Date: Tue, May 17, 2016 at 2:02 AM
Subject: Re: Partition and Split rows
To: user@kudu.incubator.apache.org


Hi Amit, responses inline
 
Q: Can I fetch the kudu Timestamp data from Tableu/Pentaho for reporting and 
analytics purpose or I need Int64 datatype only.

Is Tableu/Pentaho using Impala to query?  If so, see the answers below.
 
Q: Are you going to provide the Kimpala merge or kudu timestamp support in 
Impala/Tableu in near future.
 
TIMESTAMP typed columns should work in Impala/Kudu as of the 0.8 release (the 
latest), however I know there were a few bugs in the previous releases.  If 
your using the latest and are still seeing errors, please send the error and/or 
file an issue on JIRA.
 
Q: At this moment, instead of Timestamp we are using Int64 type in kudu, will 
it be equally helpful for performance, if we use the partition and split by 
explanation given for timestamp datatype? e.g. can we get a better performance 
for a composite key on a given kudu table defined as 'metric(s),Int64(has 
timestamp data) and with a partition on Int64 column with Split by clause?

Internally, the TIMESTAMP type is just an alias to INT64, so it has the exact 
same performance characteristics.  The only difference is how the values are 
displayed in log messages and on the web UI.

- Dan
 
On Sat, May 7, 2016 at 9:20 PM, Dan Burkert <d...@cloudera.com> wrote:
Hi Sand,

I've been working on some diagrams to help explain some of the more advanced 
partitioning types, it's attached.   Still pretty rough at this point, but the 
goal is to clean it up and move it into the Kudu documentation proper.  I'm 
interested to hear what kind of time series you are interested in Kudu for.  
I'm tasked with improving Kudu for time series, you can follow progress here. 
If you have any additional ideas I'd love to hear them.  You may also be 
interested in a small project that a JD and I have been working on in the past 
week to build an OpenTSDB style store on top of Kudu, you can find it here.  
Still quite feature limited at this point.

- Dan

On Fri, May 6, 2016 at 4:51 PM, Sand Stone <sand.m.st...@gmail.com> wrote:
Thanks. Will read. 

Given that I am researching time series data, row locality is crucial :-)  

On Fri, May 6, 2016 at 3:57 PM, Jean-Daniel Cryans <jdcry...@apache.org> wrote:
We do have non-covering range partitions coming in the next few months, here's 
the design (in review): 
http://gerrit.cloudera.org:8080/#/c/2772/9/docs/design-docs/non-covering-range-partitions.md

The "Background & Motivation" section should give you a good idea of why I'm 
mentioning this.

Meanwhile, if you don't need row locality, using hash partitioning could be 
good enough.

J-D

On Fri, May 6, 2016 at 3:53 PM, Sand Stone <sand.m.st...@gmail.com> wrote:
Makes sense. 

Yeah it would be cool if users could specify/control the split rows after the 
table is created. Now, I have to "think ahead" to pre-create the range buckets. 

On Fri, May 6, 2016 at 3:49 PM, Jean-Daniel Cryans <jdcry...@apache.org> wrote:
You will only get 1 tablet and no data distribution, which is bad.

That's also how HBase works, but it will split regions as you insert data and 
eventually you'll get some data distribution even if it doesn't start in an 
ideal situation. Tablet splitting will come later for Kudu.

J-D

On Fri, May 6, 2016 at 3:42 PM, Sand Stone <sand.m.st...@gmail.com> wrote:
One more questions, how does the range partition work if I don't specify the 
split rows? 

Thanks! 

On Fri, May 6, 2016 at 3:37 PM, Sand Stone <sand.m.st...@gmail.com> wrote:
Thanks, Misty. The "advanced" impala example helped. 

I was just reading the Java API,CreateTableOptions.java, it's unclear how the 
range partition column names associated with the partial rows params in the 
addSplitRow
API.

On Fri, May 6, 2016 at 3:08 PM, Misty Stanley-Jones 
<mstanleyjo...@cloudera.com> wrote:
Hi Sand,

Please have a look at 
http://getkudu.io/docs/kudu_impala_integration.html#partitioning_tables and see 
if it is helpful to you.

Thanks,
Misty

On Fri, May 6, 2016 at 2:00 PM, Sand Stone <sand.m.st...@gmail.com> wrote:
Hi, I am new to Kudu. I wonder how the split rows work. I know from some docs, 
this is currently for pre-creation the table. I am researching how to partition 
(hash+range) some time series test data. 

Is there an example? or notes somewhere I could read upon. 

Thanks much. 











--
Thanks & Regards,
Amit Adhau | Data Architect

GLOBANT | IND:+91 9821518132














The information contained in this e-mail may be confidential. It has been sent 
for the sole use of the intended recipient(s). If the reader of this message is 
not an intended recipient, you are hereby notified that any unauthorized 
review, use, disclosure, dissemination, distribution or copying of this 
communication, or any of its contents, is strictly prohibited. If you have 
received it by mistake please let us know by e-mail immediately and delete it 
from your system. Many thanks.
 
La información contenida en este mensaje puede ser confidencial. Ha sido 
enviada para el uso exclusivo del destinatario(s) previsto. Si el lector de 
este mensaje no fuera el destinatario previsto, por el presente queda Ud. 
notificado que cualquier lectura, uso, publicación, diseminación, distribución 
o copiado de esta comunicación o su contenido está estrictamente prohibido. En 
caso de que Ud. hubiera recibido este mensaje por error le agradeceremos 
notificarnos por e-mail inmediatamente y eliminarlo de su sistema. Muchas 
gracias.





--
Thanks & Regards,
Amit Adhau | Data Architect

GLOBANT | IND:+91 9821518132














The information contained in this e-mail may be confidential. It has been sent 
for the sole use of the intended recipient(s). If the reader of this message is 
not an intended recipient, you are hereby notified that any unauthorized 
review, use, disclosure, dissemination, distribution or copying of this 
communication, or any of its contents, is strictly prohibited. If you have 
received it by mistake please let us know by e-mail immediately and delete it 
from your system. Many thanks.
 
La información contenida en este mensaje puede ser confidencial. Ha sido 
enviada para el uso exclusivo del destinatario(s) previsto. Si el lector de 
este mensaje no fuera el destinatario previsto, por el presente queda Ud. 
notificado que cualquier lectura, uso, publicación, diseminación, distribución 
o copiado de esta comunicación o su contenido está estrictamente prohibido. En 
caso de que Ud. hubiera recibido este mensaje por error le agradeceremos 
notificarnos por e-mail inmediatamente y eliminarlo de su sistema. Muchas 
gracias.


The information contained in this e-mail may be confidential. It has been sent 
for the sole use of the intended recipient(s). If the reader of this message is 
not an intended recipient, you are hereby notified that any unauthorized 
review, use, disclosure, dissemination, distribution or copying of this 
communication, or any of its contents, is strictly prohibited. If you have 
received it by mistake please let us know by e-mail immediately and delete it 
from your system. Many thanks.
 
La información contenida en este mensaje puede ser confidencial. Ha sido 
enviada para el uso exclusivo del destinatario(s) previsto. Si el lector de 
este mensaje no fuera el destinatario previsto, por el presente queda Ud. 
notificado que cualquier lectura, uso, publicación, diseminación, distribución 
o copiado de esta comunicación o su contenido está estrictamente prohibido. En 
caso de que Ud. hubiera recibido este mensaje por error le agradeceremos 
notificarnos por e-mail inmediatamente y eliminarlo de su sistema. Muchas 
gracias.

Re: Fwd: Partition and Split rows

Reply via email to