Re: [ANNOUNCE] Welcoming Abhishek Chennaka as Kudu committer and PMC member

2023-02-22 Thread Todd Lipcon
Congrats on the recognition of you work, Abhishek! Todd On Wed, Feb 22, 2023, 6:15 PM 邓科 wrote: > Congrats Abhishek!!! > > Yingchun Lai 于2023年2月23日周四 08:05写道: > > > Congrats! > > > > Mahesh Reddy 于2023年2月23日 周四03:07写道: > > > > > Congrats Abhishek!!! Great work and well deserved! > > > > > > On

Re: Implications/downside of increasing rpc_service_queue_length

2020-04-20 Thread Todd Lipcon
Hi Mauricio, Sorry for the late reply on this one. Hope "better late than never" is the case here :) As you implied in your email, the main issue with increasing queue length to deal with queue overflows is that it only helps with momentary spikes. According to queueing theory (and intuition) if

Re: Kudu - Dremio

2020-03-30 Thread Todd Lipcon
r 29, 2020 at 11:44 AM pino patera > wrote: > >> > >> Hi > >> anyone integrated Kudu into Dremio (/www.dremio.com) data lake? > >> Any alternative suggestion (i.e. Presto?) ? > >> > >> Thanks > -- Todd Lipcon Software Engineer, Cloudera

Re: Write transactions latency

2020-03-06 Thread Todd Lipcon
n? Am asking > because i see some correlation between update schema operation(deleting > range partition) time and number transactions in-light on Kudu tablet > servers. > > Regards Dmitry > > > -- Todd Lipcon Software Engineer, Cloudera

Re: it is a good idea to dockerize kudu

2020-02-25 Thread Todd Lipcon
to dockerize > kudu. > The problem I concern about dockerizing kudu is storage performance loss. I > found out this Docker storage driver benchmarks (last updated October 2017) > https://github.com/chriskuehl/docker-storage-benchmark > > Best regards, > Kyle Zhike Chen > -- Todd Lipcon Software Engineer, Cloudera

Re: Incorta vs Kudu

2019-10-15 Thread Todd Lipcon
they can do real-time but I > just watched a demo and looks like it is classical batch/incremental > process. > https://community.incorta.com/t/18d8x2/data-hubmaterialized-view-question > > > https://community.incorta.com/t/18jndy/what-are-the-types-of-data-load-that-incorta-supports > > > -- Todd Lipcon Software Engineer, Cloudera

Re: Underutilization of hardware resources with smaller number of Tservers

2019-07-12 Thread Todd Lipcon
Would be useful to capture top -H during the workload as well to see if any particular threads are at 100%. Could be the reactor thread acting as a bottleneck On Fri, Jul 12, 2019, 10:54 AM Adar Lieber-Dembo wrote: > Thanks for the detailed summary and analysis. I want to make sure I > understan

Re:

2019-07-02 Thread Todd Lipcon
34 kudu::Thread::SuperviseThread() > > @ 0x7f049719cdd5 start_thread > > @ 0x7f0495473ead __clone > This is just a warning about a potential latency blip, and likely completely unrelated to the problem you're reporting. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: WAL size estimation

2019-06-26 Thread Todd Lipcon
696..1176034391 2 (3920336..3923031) 2696 > 0 > 13: [36592..39191]: 1176037440..1176040039 2 (3926080..3928679) 2600 > 0 > 14: [39192..41839]: 1176072008..1176074655 2 (3960648..3963295) 2648 > 0 > 15: [41840..44423]: 1176097752..1176100335 2 (

Re: Need information about internals of InList predicate

2019-06-26 Thread Todd Lipcon
ly, I > was unable find any information, a few JIRA tasks only, but that didn't > helped. > > https://issues.apache.org/jira/browse/KUDU-2853 > https://issues.apache.org/jira/browse/KUDU-1644 > > Best regards, Sergey. > -- Todd Lipcon Software Engineer, Cloudera

Re: WAL size estimation

2019-06-19 Thread Todd Lipcon
y 8 blocks and all other blocks are the > hole. > > > So looks like I can use formulas with confidence. > Normal case: 8 MB/segment * 80 max segments * 2000 tablets = 1,280,000 MB > = ~1.3 TB (+ some minor index overhead) > Worse case: 8 MB/segment * 1 segment * 2000 tablets = 1,280,

Re: WAL size estimation

2019-06-18 Thread Todd Lipcon
on? > > 3. Not a question. Please, consider adding documentation about the > estimation of WAL storage. Also, I can't found any mentions about index > files, except here > https://kudu.apache.org/docs/scaling_guide.html#file_descriptors. > > Thanks! > > -- > with best regards, Pavel Martynov > -- Todd Lipcon Software Engineer, Cloudera

Re: Kudu CLI tool JSON format

2019-06-11 Thread Todd Lipcon
> As you can see kudu tool encodes zeros as \u, but don't encode some > other non-text bytes. > > What do you think about it? > > -- > with best regards, Pavel Martynov > -- Todd Lipcon Software Engineer, Cloudera

[ANNOUNCE] Welcoming Yingchun Lai as a Kudu committer and PMC member

2019-06-05 Thread Todd Lipcon
Hi Kudu community, I'm happy to announce that the Kudu PMC has voted to add Yingchun Lai as a new committer and PMC member. Yingchun has been contributing to Kudu for the last 6-7 months and contributed a number of bug fixes, improvements, and features, including: - new CLI tools (eg 'kudu table

Re: problems with impala+kudu

2019-05-17 Thread Todd Lipcon
(code THRIFTTRANSPORT): > TTransportException('TSocket read 0 bytes',). > Could you pls tell me how to deal with this problem? By the way, the kudu > is installed by rpm, the relatived url: > https://github.com/MartinWeindel/kudu-rpm. > > > Best wishes. > yours truly, > Jack Lin > -- Todd Lipcon Software Engineer, Cloudera

Re: "broadcast" tablet replication for kudu?

2019-04-24 Thread Todd Lipcon
used with every single query and to make things worse joined > more than once in the same query. > > Is there a way to replicate this table on every node to improve > performance and avoid broadcasting this table every time? > > On Mon, Jul 23, 2018 at 10:52 AM Todd Lipcon wrote

Re: close Kudu client on timeout

2019-01-17 Thread Todd Lipcon
>> Mike >>>> >>>> Sent from my iPhone >>>> >>>> > On Jan 16, 2019, at 12:27 PM, Boris Tyukin >>>> wrote: >>>> > >>>> > Hi guys, >>>> > >>>> > is there a setting on Kudu se

Re: trying to install kudu from source

2018-12-10 Thread Todd Lipcon
\ > libsasl2-dev \ > libsasl2-modules \ > libsasl2-modules-gssapi-mit \ > libssl-dev \ > libtool \ > lsb-release \ > make \ > ntp \ > net-tools \ > openjdk-8-jdk \ > openssl \ > patch \ > python-dev \ > python-pip \ > python3-dev \ > python3 \ > python3-pip \ > pkg-config \ > python \ > rsync \ > unzip \ > vim-common \ > wget > > #Install Kudu > #RUN git clone https://github.com/apache/kudu \ > user@kudu.apache.orgWORKDIR / > RUN wget > https://www-us.apache.org/dist/kudu/1.8.0/apache-kudu-1.8.0.tar.gz > RUN mkdir -p /kudu && tar -xzf apache-kudu-1.8.0.tar.gz -C /kudu > --strip-components=1 > RUN ls / > > RUN cd /kudu \ > && thirdparty/build-if-necessary.sh > RUN cd /kudu && mkdir -p build/release \ > && cd /kudu/build/release \ > && ../../thirdparty/installed/common/bin/cmake -DCMAKE_BUILD_TYPE=release > -DCMAKE_INSTALL_PREFIX:PATH=/usr ../.. \ > && make -j4 > > RUN cd /kudu/build/release \ > && make install > > > > > > > -- Todd Lipcon Software Engineer, Cloudera

Re: strange behavior of getPendingErrors

2018-11-17 Thread Todd Lipcon
Kind regards, > > Alexey > > On Fri, Nov 16, 2018 at 7:24 PM Boris Tyukin > wrote: > >> Hi Todd, >> >> We are on Kudu 1.5 still and I used Kudu client 1.7 >> >> Thanks, >> Boris >> >> On Fri, Nov 16, 2018, 17:07 Todd Lipcon > >&g

Re: strange behavior of getPendingErrors

2018-11-16 Thread Todd Lipcon
NT32 key=9, STRING value=value 9, UNIXTIME_MICROS > dt_tm=2018-11-16T20:57:03.603000Z > INT32 key=3, STRING value=value 3, UNIXTIME_MICROS > dt_tm=2018-11-16T20:57:03.595000Z > INT32 key=10, STRING value=NULL, UNIXTIME_MICROS > dt_tm=2018-11-16T20:57:03.603000Z > INT32 key=5, STRING value=value 5, UNIXTIME_MICROS > dt_tm=2018-11-16T20:57:03.597000Z > INT32 key=7, STRING value=value 7, UNIXTIME_MICROS > dt_tm=2018-11-16T20:57:03.598000Z > > > > > -- Todd Lipcon Software Engineer, Cloudera

Re: cannot import kudu.client

2018-08-31 Thread Todd Lipcon
> > > (env) root@boot2docker:~/kudu# python > Python 2.7.12 (default, Dec 4 2017, 14:50:18) > [GCC 5.4.0 20160609] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import kudu > >>>

Re: Dictionary encoding

2018-08-06 Thread Todd Lipcon
ity, > > Does any body know what is the maximum distinct values of a String column > that Kudu considers in order to set its encoding to Dictionary? Many thanks > :) > > br, > > -- Todd Lipcon Software Engineer, Cloudera

Re: Re: Recommended maximum amount of stored data per tablet server

2018-08-02 Thread Todd Lipcon
to share with other systems. One recommendation, though is to consider using a dedicated disk for the Kudu WAL and metadata, which can help performance, since the WAL can be sensitive to other heavy workloads monopolizing bandwidth on the same spindle. -Todd > > At 2018-08-03 02:26:37, "Tod

Re: Recommended maximum amount of stored data per tablet server

2018-08-02 Thread Todd Lipcon
with 15 * 4TB spinning disk drives and 256GB > RAM, 48 cpu cores. Does it mean the other 52(= 15 * 4 - 8) TB space is > recommended to leave for other systems? We prefer to make the machine > dedicated to Kudu. Can tablet server leverage the whole space efficiently? > > > > Thanks, > > Quanlong > -- Todd Lipcon Software Engineer, Cloudera

Re: Re: Re: Why RowSet size is much smaller than flush_threshold_mb

2018-08-01 Thread Todd Lipcon
defaults or giving some more prescriptive advice? I'm a little nervous that saying "here are all the internals, and here are 100 config flags to study" will scare users more than help them :) -Todd > > At 2018-08-02 01:06:40,"Todd Lipcon" wrote: > > On Wed, A

Re: Re: Why RowSet size is much smaller than flush_threshold_mb

2018-08-01 Thread Todd Lipcon
make this trade-off. -Todd > At 2018-06-15 23:41:17, "Todd Lipcon" wrote: > > Also, keep in mind that when the MRS flushes, it flushes into a bunch of > separate RowSets, not 1:1. It "rolls" to a new RowSet every N MB (N=32 by > default). This is set by --budge

Re: "broadcast" tablet replication for kudu?

2018-07-23 Thread Todd Lipcon
hing like an extremely high replication count.* > > *I could see bumping the replication count to 5 for these tables since the > extra storage cost is low and it will ensure higher availability of the > important central tables, but I'd be surprised if there is any measurable > p

Re: "broadcast" tablet replication for kudu?

2018-07-23 Thread Todd Lipcon
Impala 2.12. The external RPC protocol is still Thrift. Todd On Mon, Jul 23, 2018, 7:02 AM Clifford Resnick wrote: > Is this impala 3.0? I’m concerned about breaking changes and our RPC to > Impala is thrift-based. > > From: Todd Lipcon > Reply-To: "user@kudu.apache.org&qu

Re: "broadcast" tablet replication for kudu?

2018-07-23 Thread Todd Lipcon
thread's context >>>>> perhaps partition kudu table, even if small, into multiple tablets), it >>>>> was >>>>> to speed up joins/exchanges, not to parallelize the scan. >>>>> >>>>> For example recently we ran into

Re: Deleting from kudu table issue

2018-07-11 Thread Todd Lipcon
s or avoid it . > > Thanks ! > > Best regards . > > > -- > > wang jiaxi > -- Todd Lipcon Software Engineer, Cloudera

Re: spark on kudu performance!

2018-07-05 Thread Todd Lipcon
e should happen automatically so long as the filter predicate has been pushed down. Using 'explain()' and showing us the results, along with the code you used to create your table, will help understand what might be the problem with performance. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: Adding a new kudu master

2018-07-05 Thread Todd Lipcon
te should be executed: > > *UPDATE hive_meta_store_database.TABLE_PARAMS* > > *SET PARAM_VALUE = 'master-1,master-2,master-3'* > > *WHERE PARAM_KEY = 'kudu.master_addresses' AND PARAM_VALUE = > 'master-1,master-2,master-3,master-4';* > > After upgrades, the master-4 node to be removed by running > steps 1-5. > > > > Thanks! > > > > Best regards, > > *Sergejs Andrejevs* > > Information about how we process personal data > <http://www.intrum.com/privacy> > > > -- Todd Lipcon Software Engineer, Cloudera

Re: kudu Insert、Update、Delete operating data lost

2018-06-15 Thread Todd Lipcon
ssion contains Insert, Update, Delete operations, if > the database does not exist in the data there will be > some new data loss, how to avoid such problems. > -- Todd Lipcon Software Engineer, Cloudera

Re: Why RowSet size is much smaller than flush_threshold_mb

2018-06-15 Thread Todd Lipcon
_time_us":1288971,"lbm_reads_1-10_ms >> <https://maps.google.com/?q=1-10_ms+:+32&entry=gmail&source=g>":32," >> lbm_reads_10-100_ms":41,"lbm_reads_lt_1ms":4641,"lbm_write_ >> time_us":122520,"lbm_writes_lt_1ms":2799,"mutex_wait_us": >> 25,"spinlock_wait_cycles":155264,"tcmalloc_contention_ >> cycles":768,"thread_start_us":677,"threads_started":14,"wal- >> append.queue_time_us":300} >> >> The flush_threshold_mb is set in the default value (1024). Wouldn't the >> flushed file size be ~1GB? >> >> I think increasing the initial RowSet size can reduce compactions and >> then reduce the impact of other ongoing operations. It may also improve the >> flush performance. Is that right? If so, how can I increase the RowSet size? >> >> I'd be grateful if someone can make me clear about these! >> >> Thanks, >> Quanlong >> > > -- Todd Lipcon Software Engineer, Cloudera

Re: Java KuduClient table references

2018-06-05 Thread Todd Lipcon
panies registered or incorporated in the European Union. This e-mail may > contain confidential and/or privileged information. If you are not the > intended recipient (or have received this e-mail in error) please notify > the sender immediately and delete this e-mail. Any unauthorized copying, > disclosure or distribution of the material in this e-mail is strictly > forbidden. > > -- Todd Lipcon Software Engineer, Cloudera

Re: How to install the latest kudu release and find the compatible Impala versions?

2018-05-22 Thread Todd Lipcon
On Mon, May 21, 2018 at 4:37 PM, Quanlong Huang wrote: > Hi friends, > > We're trying to benchmark Impala+kudu to compare with other lambda > architectures like Druid. So we hope we can install the latest release > version of Impala (2.12.0) and kudu (1.7.0). However, when following the > instal

Re: will upsert have bad effect on scan performace?

2018-05-21 Thread Todd Lipcon
ct even though these data were firstly loaded. > i do not know compaction mechanism of kudu, will it lead to many > compaction, thus lead to bad scan performance. > > Best regards. > -- Todd Lipcon Software Engineer, Cloudera

Re: scan performance super bad

2018-05-14 Thread Todd Lipcon
quot;78" <= VALUES < "785000", > PARTITION "785000" <= VALUES < "79", > PARTITION "79" <= VALUES < "795000", > PARTITION "795000" <= VALUES < "80", > PARTITION "80" <= VALUES < "805000", > PARTITION "805000" <= VALUES < "81", > PARTITION "81" <= VALUES < "815000", > PARTITION "815000" <= VALUES < "82", > PARTITION "82" <= VALUES < "825000", > PARTITION "825000" <= VALUES < "83", > PARTITION "83" <= VALUES < "835000", > PARTITION "835000" <= VALUES < "84", > PARTITION "84" <= VALUES < "845000", > PARTITION "845000" <= VALUES < "85", > PARTITION "85" <= VALUES < "855000", > PARTITION "855000" <= VALUES < "86", > PARTITION "86" <= VALUES < "865000", > PARTITION "865000" <= VALUES < "87", > PARTITION "87" <= VALUES < "875000", > PARTITION "875000" <= VALUES < "88", > PARTITION "88" <= VALUES < "885000", > PARTITION "885000" <= VALUES < "89", > PARTITION "89" <= VALUES < "895000", > PARTITION "895000" <= VALUES < "90", > PARTITION "90" <= VALUES < "905000", > PARTITION "905000" <= VALUES < "91", > PARTITION "91" <= VALUES < "915000", > PARTITION "915000" <= VALUES < "92", > PARTITION "92" <= VALUES < "925000", > PARTITION "925000" <= VALUES < "93", > PARTITION "93" <= VALUES < "935000", > PARTITION "935000" <= VALUES < "94", > PARTITION "94" <= VALUES < "945000", > PARTITION "945000" <= VALUES < "95", > PARTITION "95" <= VALUES < "955000", > PARTITION "955000" <= VALUES < "96", > PARTITION "96" <= VALUES < "965000", > PARTITION "965000" <= VALUES < "97", > PARTITION "97" <= VALUES < "975000", > PARTITION "975000" <= VALUES < "98", > PARTITION "98" <= VALUES < "985000", > PARTITION "985000" <= VALUES < "99", > PARTITION "99" <= VALUES < "995000", > PARTITION VALUES >= "995000" > ) > > > So it looks like you have a numeric value being stored here in the string column. Are you sure that you are properly zero-padding when creating your key? For example if you accidentally scan from "50_..." to "80_..." you will end up scanning a huge portion of your table. > i did not delete rows in this table ever. > > my scanner code is below: > buildKey method will build the lower bound and the upper bound, the unique > id is same, the startRow offset(third part) is 0, and the endRow offset is > , startRow and endRow only differs from time. > though the max offset is big(999), generally it is less than 100. > > private KuduScanner buildScanner(Metric startRow, Metric endRow, > List dimensionIds, List dimensionFilterList) { > KuduTable kuduTable = > kuduService.getKuduTable(BizConfig.parseFrom(startRow.getBizId())); > > PartialRow lower = kuduTable.getSchema().newPartialRow(); > lower.addString("key", buildKey(startRow)); > PartialRow upper = kuduTable.getSchema().newPartialRow(); > upper.addString("key", buildKey(endRow)); > > LOG.info("build scanner. lower = {}, upper = {}", buildKey(startRow), > buildKey(endRow)); > > KuduScanner.KuduScannerBuilder builder = > kuduService.getKuduClient().newScannerBuilder(kuduTable); > builder.setProjectedColumnNames(COLUMNS); > builder.lowerBound(lower); > builder.exclusiveUpperBound(upper); > builder.prefetching(true); > builder.batchSizeBytes(MAX_BATCH_SIZE); > > if (CollectionUtils.isNotEmpty(dimensionFilterList)) { > for (int i = 0; i < dimensionIds.size() && i < MAX_DIMENSION_NUM; > i++) { > for (DimensionFilter dimensionFilter : dimensionFilterList) { > if (!Objects.equals(dimensionFilter.getDimensionId(), > dimensionIds.get(i))) { > continue; > } > ColumnSchema columnSchema = > kuduTable.getSchema().getColumn(String.format("dimension_%02d", i)); > KuduPredicate predicate = buildKuduPredicate(columnSchema, > dimensionFilter); > if (predicate != null) { > builder.addPredicate(predicate); > LOG.info("add predicate. predicate = {}", > predicate.toString()); > } > } > } > } > return builder.build(); > } > > What client version are you using? 1.7.0? > i checked the metrics, only get content below, it seems no relationship > with my table. > Looks like you got the metrics from the kudu master, not a tablet server. You need to figure out which tablet server you are scanning and grab the metrics from that one. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: 答复: Issue in data loading in Impala + Kudu

2018-05-13 Thread Todd Lipcon
{remote=136.243.74.42:7050 >> (slave5), user_credentials={real_user=root}} blocked reactor thread for >> 35859.8us >> >> I0507 09:38:15.942150 29882 outbound_call.cc:288] RPC callback for RPC >> call kudu.tserver.TabletServerService.Write -> {remote=136.243.74.42:7050 >> (slave5), user_credentials={real_user=root}} blocked reactor thread for >> 40664.9us >> >> I0507 09:38:17.495046 29882 outbound_call.cc:288] RPC callback for RPC >> call kudu.tserver.TabletServerService.Write -> {remote=136.243.74.42:7050 >> (slave5), user_credentials={real_user=root}} blocked reactor thread for >> 49514.6us >> >> I0507 09:46:12.664149 4507 coordinator.cc:783] Release admission control >> resources for query_id=3e4a4c646800e1d9:c859bb7f >> >> F0507 09:46:12.673912 29258 error-util.cc:148] Check failed: >> log_entry.count > 0 (-1831809966 vs. 0) >> >> Wrote minidump to /tmp/minidumps/impalad/a9113d9 >> b-bc3d-488a-1feebf9b-47b42022.dmp >> >> >> >> *Note*: >> >> We are executing the queries on 8 node cluster with the following >> configuration >> >> Cluster : 8 Node Cluster (48 GB RAM , 8 CPU Core and 2 TB hard-disk each, >> Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz >> >> >> >> >> >> -- >> >> Regards, >> >> Geetika Gupta >> > > > > -- > Regards, > Geetika Gupta > -- Todd Lipcon Software Engineer, Cloudera

Re: scan performance super bad

2018-05-13 Thread Todd Lipcon
a between the bound is about 8000, so i should call hundreds > times nextRows() to fetch all data, and it finally cost several minutes. > > i don't know why this happened and how to resolve itmaybe the final > solution is that i should giving up kudu, using hbase instead... > -- Todd Lipcon Software Engineer, Cloudera

Re: Kudu read - performance issue

2018-05-11 Thread Todd Lipcon
if you had for example: pre-chunk in-list: 1,2,3,4,5,6 chunk 1: col2 IN (1,6) chunk 2: col2 IN (2,5) chunk 3: col2 IN (3,4) then you will actually scan over the middle portion of that table 3 times. If you sort the in-list before chunking you'll avoid the multiple-scan effect here. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: Column Compression and Encoding

2018-05-08 Thread Todd Lipcon
ot the entire PK, it will only be used on the read path when that actual column is selected, and it has the same performance impact (positive or negative) as any other column in the row. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: Column Compression and Encoding

2018-05-08 Thread Todd Lipcon
subjects that users can use in > the future. Thanks. > > Regards, > > -- Todd Lipcon Software Engineer, Cloudera

Re: Kudu Exception -Couldnot find any valid location.

2018-04-16 Thread Todd Lipcon
n on the table like > insert in to the table, It throws exception like :Couldnot find any valid > location. Unkown host exception. Thanks in advance for your valuable time. -- Todd Lipcon Software Engineer, Cloudera

Re: question about kudu performance!

2018-04-13 Thread Todd Lipcon
min_ratio (default 0.1). Raising this would decrease the frequency of major delta compaction, but I think there is likely something else going on here. -Todd > > > > Can you give me some suggestions to optimize this performance problem? Usually the best way to improve performance is by thinking carefully about schema design, partitioning, and workload, rather than tuning configuration. Maybe you can share more about your workload, schema, and partitioning. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: Limitations on total amount of data stored in one kudu table

2018-03-21 Thread Todd Lipcon
; limitation "Maximum number of tablets per table for each tablet server is > 60, post-replication"? Is it possible that this restriction will be removed? > See above. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: 答复: A few questions for using Kudu

2018-03-19 Thread Todd Lipcon
f replicas to 3 in order to have fault tolerance. > > XiaoNing: So if we want to have fault tolerance, we should at least set > the replica number to be 3, right? > That's right. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: Kudu client close exception

2018-03-19 Thread Todd Lipcon
netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > > > Thanks > > Rainerdun > > > > > -- Todd Lipcon Software Engineer, Cloudera

Re: "broadcast" tablet replication for kudu?

2018-03-16 Thread Todd Lipcon
t many nodes. > > Wouldn't it be useful here for Cliff's small dims to be partitioned into a > couple tablets to similarly improve parallelism? > > -m > > On Fri, Mar 16, 2018 at 2:29 PM, Todd Lipcon wrote: > >> On Fri, Mar 16, 2018 at 2:19 PM, Cliff Resnick

Re: "broadcast" tablet replication for kudu?

2018-03-16 Thread Todd Lipcon
re basic reason. > Impala could definitely be smarter, just a matter of programming Kudu-specific join strategies into the optimizer. Today, the optimizer isn't aware of the unique properties of Kudu scans vs other storage mechanisms. -Todd > > -Cliff > > On Fri, Ma

Re: "broadcast" tablet replication for kudu?

2018-03-16 Thread Todd Lipcon
perhaps to sum thing up, if nearly 100% of my metadata scan are single > Primary Key lookups followed by a tiny broadcast then am I really just > splitting hairs performance-wise between Kudu and HDFS-cached parquet? > > From: Todd Lipcon > Reply-To: "user@kudu.apache.org"

Re: "broadcast" tablet replication for kudu?

2018-03-16 Thread Todd Lipcon
du. >>>> One Redshift feature that we will miss is its ALL Distribution, where a >>>> copy of a table is maintained on each server. We define a number of >>>> metadata tables this way since they are used in nearly every query. We are >>>> considering using parquet in HDFS cache for these, and Kudu would be a much >>>> better fit for the update semantics but we are worried about the additional >>>> contention. I'm wondering if having a Broadcast, or ALL, tablet >>>> replication might be an easy feature to add to Kudu? >>>> >>>> -Cliff >>>> >>> >>> >> > -- Todd Lipcon Software Engineer, Cloudera

Re: "broadcast" tablet replication for kudu?

2018-03-16 Thread Todd Lipcon
define a number of >>> metadata tables this way since they are used in nearly every query. We are >>> considering using parquet in HDFS cache for these, and Kudu would be a much >>> better fit for the update semantics but we are worried about the additional >>> contention. I'm wondering if having a Broadcast, or ALL, tablet >>> replication might be an easy feature to add to Kudu? >>> >>> -Cliff >>> >> >> > -- Todd Lipcon Software Engineer, Cloudera

Re: Follow-up for "Kudu cluster performance cannot grow up with machines added"

2018-03-13 Thread Todd Lipcon
hroughput to drop. This is tracked by KUDU-1693. I believe there was another JIRA somewhere related as well, but can't seem to find it. Unfortunately fixing it is not straightforward, though would have good impact for these cases where a single writer is fanning out to tens or hundreds of tablets. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: Kudu cluster performance cannot grow up with machines added

2018-03-11 Thread Todd Lipcon
What client are you using to benchmark? You might also be bound by the client performance. On Mar 11, 2018 2:04 PM, "Brock Noland" wrote: > Hi, > > I'd verify that the new nodes are assigned tablets? Along with > considering an increase the number of partitions on the table being tested. > > On

Re: Kudu as a Graphite backend

2018-03-05 Thread Todd Lipcon
and >> is intended only for the use of the recipient(s) named above. If you are >> not the intended recipient, you are hereby notified that any dissemination, >> distribution, or copying of this communication, or any of its contents, is >> strictly prohibited. If you have received this communication in error, >> please notify the sender and delete/destroy the original message and any >> copy of it from your computer or paper files. >> > > -- Todd Lipcon Software Engineer, Cloudera

Re: Impala Parquet to Kudu 1.5 - severe ingest performance degradation

2018-02-28 Thread Todd Lipcon
ables and as you can see from my example query, > it is a really straight select from a table - no joins, no predicates and > no complex calculations. > > Thanks again, > Boris > > On Thu, Feb 22, 2018 at 2:44 PM, Todd Lipcon wrote: > >> In addition to what Hao suggests,

Re: swap data in Kudu table

2018-02-23 Thread Todd Lipcon
not seem to find a good strategy. The only thing came >> to my mind is to drop the production table and rename a staging table to >> production table as the last step of the job, but in this case we are going >> to lose statistics and security permissions. >> >> Any other ideas? >> >> Thanks! >> Boris >> > > -- Todd Lipcon Software Engineer, Cloudera

Re: Impala Parquet to Kudu 1.5 - severe ingest performance degradation

2018-02-22 Thread Todd Lipcon
_id, >> >> CAST(nomen_string_flag as STRING) nomen_string_flag, >> >> src_event_id, >> >> CAST(last_utc_ts as BIGINT) last_utc_ts, >> >> device_free_txt, >> >> CAST(trait_bit_map as STRING) trait_bit_map, >> >> CAST(clu_subkey1_flag as STRING) clu_subkey1_flag, >> >> CAST(clinsig_updt_dt_tm as BIGINT) clinsig_updt_dt_tm, >> >> CAST(event_end_dt_tm as BIGINT) event_end_dt_tm, >> >> CAST(event_start_dt_tm as BIGINT) event_start_dt_tm, >> >> CAST(expiration_dt_tm as BIGINT) expiration_dt_tm, >> >> CAST(verified_dt_tm as BIGINT) verified_dt_tm, >> >> CAST(src_clinsig_updt_dt_tm as BIGINT) src_clinsig_updt_dt_tm, >> >> CAST(updt_dt_tm as BIGINT) updt_dt_tm, >> >> CAST(valid_from_dt_tm as BIGINT) valid_from_dt_tm, >> >> CAST(valid_until_dt_tm as BIGINT) valid_until_dt_tm, >> >> CAST(performed_dt_tm as BIGINT) performed_dt_tm, >> >> txn_id_text, >> >> CAST(ingest_dt_tm as BIGINT) ingest_dt_tm >> >> FROM v500.clinical_event >> > > -- Todd Lipcon Software Engineer, Cloudera

Re: Renaming hostname of tserver/master

2018-02-04 Thread Todd Lipcon
ious to hear what approach you took. -Todd On Tue, Jan 30, 2018 at 11:08 PM, Pavel Martynov wrote: > Ok, I found ticket https://issues.apache.org/jira/browse/KUDU-418, which > fired at me. > -- Todd Lipcon Software Engineer, Cloudera

Re: Using Kudu to Handle Huge amount of Data

2018-02-04 Thread Todd Lipcon
moke tests of Kudu on ~800 nodes before. > > Looking forward to your inputs on any organisation using Kudu where data > volumes of more than 10 TB is ingested everyday. > Hope some other users can chime in. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: Bulk / Initial load of large tables into Kudu using Spark

2018-01-30 Thread Todd Lipcon
9, 2018 at 2:22 PM, Todd Lipcon wrote: > >> On Mon, Jan 29, 2018 at 11:18 AM, Patrick Angeles >> wrote: >> >>> Hi Boris. >>> >>> 1) I would like to bypass Impala as data for my bulk load coming from >>>> sqoop and avro files are stored

Re: Bulk / Initial load of large tables into Kudu using Spark

2018-01-29 Thread Todd Lipcon
table >> SELECT * FROM some_csv_tabledoes the trick. >> >> You can also use Kudu’s MapReduce OutputFormat to load data from HDFS, >> HBase, or any other data store that has an InputFormat. >> >> No tool is provided to load data directly into Kudu’s on-disk data >> format. We have found that for many workloads, the insert performance of >> Kudu is comparable to bulk load performance of other systems. >> > > -- Todd Lipcon Software Engineer, Cloudera

Re: new Kudu benchmarks

2018-01-08 Thread Todd Lipcon
to new releases coming up! > > Boris > > On Fri, Jan 5, 2018 at 9:08 PM, Todd Lipcon wrote: > >> On Fri, Jan 5, 2018 at 5:50 PM, Boris Tyukin >> wrote: >> >>> Hi Todd, >>> >>> thanks for your feedback! sure will be happy to update my po

Re: new Kudu benchmarks

2018-01-05 Thread Todd Lipcon
microsecond precision, so that's what Kudu implemented internally. With 64 bits there is still enough range to store dates for 584,554 years at microsecond precision. I think https://impala.apache.org/docs/build/html/topics/impala_timestamp.html has some info about Kudu compatibility and limitations. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: new Kudu benchmarks

2018-01-05 Thread Todd Lipcon
ation whereas a string representation would be a little clearer. Thanks -Todd > > On Fri, Jan 5, 2018 at 11:13 AM, Todd Lipcon wrote: > >> Oh, one other piece of feedback: maybe worth editing the title to say "vs >> Apache Parquet" instead of "vs Apache

Re: new Kudu benchmarks

2018-01-05 Thread Todd Lipcon
Oh, one other piece of feedback: maybe worth editing the title to say "vs Apache Parquet" instead of "vs Apache Impala" since in all cases you are using Impala as the query engine? -Todd On Fri, Jan 5, 2018 at 11:06 AM, Todd Lipcon wrote: > Hey Boris, > > Thank

Re: new Kudu benchmarks

2018-01-05 Thread Todd Lipcon
pers for such an amazing and much-needed product. > > Boris > > > -- Todd Lipcon Software Engineer, Cloudera

Re: Data inconsistency after restart

2018-01-04 Thread Todd Lipcon
uster. See https://kudu.apache.org/docs/command_line_tools_referenc >>>>>> e.html#cluster-ksck for more details. For restarting a cluster, I >>>>>> would recommend taking down all tablet servers at once, otherwise >>>>>> tablet >>>>>> replicas may try to replicate data from the server that was taken >>>>>> down. >>>>>> >>>>>> Hope this helped, >>>>>> Andrew >>>>>> >>>>>> On Tue, Dec 5, 2017 at 10:42 AM, Petter von Dolwitz (Hem) < >>>>>> petter.von.dolw...@gmail.com> wrote: >>>>>> >>>>>> Hi Kudu users, >>>>>>> >>>>>>> We just started to use Kudu (1.4.0+cdh5.12.1). To make a baseline for >>>>>>> evaluation we ingested 3 month worth of data. During ingestion we >>>>>>> were >>>>>>> facing messages from the maintenance threads that a soft memory >>>>>>> limit were >>>>>>> reached. It seems like the background maintenance threads stopped >>>>>>> performing their tasks at this point in time. It also so seems like >>>>>>> the >>>>>>> memory was never recovered even after stopping ingestion so I guess >>>>>>> there >>>>>>> was a large backlog being built up. I guess the root cause here is >>>>>>> that we >>>>>>> were a bit too conservative when giving Kudu memory. After a >>>>>>> reststart a >>>>>>> lot of maintenance tasks were started (i.e. compaction). >>>>>>> >>>>>>> When we verified that all data was inserted we found that some data >>>>>>> was missing. We added this missing data and on some chunks we got the >>>>>>> information that all rows were already present, i.e impala says >>>>>>> something >>>>>>> like Modified: 0 rows, nnn errors. Doing the verification again >>>>>>> now >>>>>>> shows that the Kudu table is complete. So, even though we did not >>>>>>> insert >>>>>>> any data on some chunks, a count(*) operation over these chunks now >>>>>>> returns >>>>>>> a different value. >>>>>>> >>>>>>> Now to my question. Will data be inconsistent if we recycle Kudu >>>>>>> after >>>>>>> seeing soft memory limit warnings? >>>>>>> >>>>>>> Is there a way to tell when it is safe to restart Kudu to avoid these >>>>>>> issues? Should we use any special procedure when restarting (e.g. >>>>>>> only >>>>>>> restart the tablet servers, only restart one tablet server at a time >>>>>>> or >>>>>>> something like that)? >>>>>>> >>>>>>> The table design uses 50 tablets per day (times 90 days). It is 8 TB >>>>>>> of data after 3xreplication over 5 tablet servers. >>>>>>> >>>>>>> Thanks, >>>>>>> Petter >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Andrew Wong >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Andrew Wong >>>>> >>>>> >>>> >>>> >>> > > -- > David Alves > -- Todd Lipcon Software Engineer, Cloudera

[ANNOUNCE] New committers over past several months

2017-12-18 Thread Todd Lipcon
Hi Kudu community, I'm pleased to announce that the Kudu PMC has voted to add Andrew Wong, Grant Henke, and Hao Hao as Kudu committers and PMC members. This announcement is a bit delayed, but I figured it's better late than never! Andrew has contributed to Kudu in a bunch of areas. Most notably,

Re: INT128 Column Support Interest

2017-11-20 Thread Todd Lipcon
ht choose INT128 if available even if they have no need for anywhere near that range. -Todd > > On Thu, Nov 16, 2017 at 5:30 PM, Dan Burkert > wrote: > > > Aren't we going to need efficient encodings in order to make decimal work > > well, anyway? > > > > - Dan

Re: Kudu start error : Failed to initialize sys tables async: on-disk and provided master lists are different

2017-11-17 Thread Todd Lipcon
d master lists are different: 10.15.213.10:7051 10.15.213.11:7051 > 10.15.213.12:7051 :0` > > > > It was the same on the other 2 master machine. > > > > I have no idea what’s going on. Am I misunderstanding this configure > option? > > > > > > > > Best wishes. > > > > Liou Fongcyuan > -- Todd Lipcon Software Engineer, Cloudera

Re: INT128 Column Support Interest

2017-11-16 Thread Todd Lipcon
ashes and other similar types >> of data. >> >> Is there any interest or uses for a INT128 column type? Is anyone >> currently using a STRING or BINARY column for 128 bit data? >> >> Thank you, >> Grant >> -- >> Grant Henke >> Software Engineer | Cloudera >> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke >> > > -- Todd Lipcon Software Engineer, Cloudera

Re: The service queue is full; it has 400 items.. Retrying in the next heartbeat period.

2017-11-13 Thread Todd Lipcon
es.apache.org/jira/browse/KUDU-1078, and the issues's status > is reopen, I have upload log for analysis the issues, If you want to more > detail, just tell me 😄。 > log files: > https://drive.google.com/open?id=1_1l2xpT3-NmumgI_sIdxch-6BocXqTCt > https://drive.google.com/open?i

Re: The service queue is full; it has 400 items.. Retrying in the next heartbeat period.

2017-11-03 Thread Todd Lipcon
One thing you might try is to update the consensus rpc timeout to 30 seconds instead of 1. We changed the default in later versions. I'd also recommend updating up 1.4 or 1.5 for other related fixes to consensus stability. I think I recall you were on 1.3 still? Todd On Nov 3, 2017 7:47 PM, "Le

Re: Error message: 'Tried to update clock beyond the max. error.'

2017-11-01 Thread Todd Lipcon
d the max. > error. > > Franco > > > -- > *From: *"Todd Lipcon" > *To: *user@kudu.apache.org > *Sent: *Wednesday, November 1, 2017 8:00:09 PM > *Subject: *Re: Error message: 'Tried to update clock beyond the max. > error.' > > > What

Re: Error message: 'Tried to update clock beyond the max. error.'

2017-11-01 Thread Todd Lipcon
and even time values up to 1000 seconds in the future (we read 1 billion nanoseconds as 1 billion microseconds (=1000 seconds)). I'll work on reproducing this and a patch, to backport to previous versions. -Todd On Wed, Nov 1, 2017 at 5:00 PM, Todd Lipcon wrote: > What's the full l

Re: Error message: 'Tried to update clock beyond the max. error.'

2017-11-01 Thread Todd Lipcon
What's the full log line where you're seeing this crash? Is it coming from tablet_bootstrap.cc, raft_consensus.cc, or elsewhere? -Todd 2017-11-01 15:45 GMT-07:00 Franco Venturi : > Our version is kudu 1.5.0-cdh5.13.0. > > Franco > > > > > -- Todd Lipcon Software Engineer, Cloudera

Re: Low ingestion rate from Kafka

2017-11-01 Thread Todd Lipcon
's a torture-test cluster of sorts that is always way out of balance, re-replicating stuff, etc) -Todd > > > > On Wed, Nov 1, 2017 at 1:40 PM, Todd Lipcon wrote: > >> On Wed, Nov 1, 2017 at 1:23 PM, Chao Sun wrote: >> >>> Thanks Todd! I improved my code

Re: Kudu background tasks

2017-11-01 Thread Todd Lipcon
information about these background > operations? I want to understand what happens in situations when some node > is offline and then comes back up after a while. What is tablet > initialization and bootstrapping, etc. > > -- > Br. > Janne Keskitalo, > Database Architect, PAF.COM > For support: dbdsupp...@paf.com > > -- Todd Lipcon Software Engineer, Cloudera

Re: Low ingestion rate from Kafka

2017-11-01 Thread Todd Lipcon
kudu test loadgen" and may have fewer options available. -Todd On Wed, Nov 1, 2017 at 12:23 AM, Todd Lipcon wrote: > >> On Wed, Nov 1, 2017 at 12:20 AM, Todd Lipcon wrote: >> >>> Sounds good. >>> >>> BTW, you can try a quick load test using the

Re: Low ingestion rate from Kafka

2017-11-01 Thread Todd Lipcon
On Wed, Nov 1, 2017 at 12:20 AM, Todd Lipcon wrote: > Sounds good. > > BTW, you can try a quick load test using the 'kudu perf loadgen' tool. > For example something like: > > kudu perf loadgen my-kudu-master.example.com --num-threads=8 > --num-rows-per-thread

Re: Low ingestion rate from Kafka

2017-11-01 Thread Todd Lipcon
threads=8 --num-rows-per-thread=100 --table-num-buckets=32 There are also a bunch of options to tune buffer sizes, flush options, etc. But with the default settings above on an 8-node cluster I have, I was able to insert 8M rows in 44 seconds (180k/sec). Adding --buffer-size-bytes=1000 almos

Re: Low ingestion rate from Kafka

2017-10-31 Thread Todd Lipcon
ion on the UUID. This should ensure that you get pretty good batching of the writes. Todd > On Tue, Oct 31, 2017 at 6:25 PM, Todd Lipcon wrote: > >> In addition to what Zhen suggests, I'm also curious how you are sizing >> your batches in manual-flush mode? With 128 hash partiti

Re: Low ingestion rate from Kafka

2017-10-31 Thread Todd Lipcon
31 15:07 GMT+08:00 Chao Sun : > >> OK. Thanks! I changed to manual flush mode and it increased to ~15K / >> sec. :) >> >> Is there any other tuning I can do to further improve this? and also, how >> much would >> SSD help in this case (only upsert)? >> >

Re: Low ingestion rate from Kafka

2017-10-30 Thread Todd Lipcon
e.newInsert(); > PartialRow row = insert.getRow(); > // fill the columns > kuduSession.apply(insert) > } > > I didn't specify the flushing mode, so it will pick up the AUTO_FLUSH_SYNC > as default? > should I use MANUAL_FLUSH? > > Thanks, > Chao > > On

Re: Low ingestion rate from Kafka

2017-10-30 Thread Todd Lipcon
Hey Chao, Nice to hear you are checking out Kudu. What are you using to consume from Kafka and write to Kudu? Is it possible that it is Java code and you are using the SYNC flush mode? That would result in a separate round trip for each record and thus very low throughput. Todd On Oct 30, 2017

Re: 答复: 答复: How kudu synchronize real-time records?

2017-10-26 Thread Todd Lipcon
t2, node3 t3 (t1 < > t2 < t3) then reading client attached node1 can see record but other > reading clients attached not node1(node2, node3) have possibilities missing > record1. > > > > I think that does not happens in kudu, and i wonder how kudu synchronize > real time data. > > > > Thanks! > > > > -- Todd Lipcon Software Engineer, Cloudera

Re: kudu 1.4 kerberos

2017-10-24 Thread Todd Lipcon
On Tue, Oct 24, 2017 at 12:41 PM, Todd Lipcon wrote: > I've filed https://issues.apache.org/jira/browse/KUDU-2198 to provide a > workaround for systems like this. I should have a patch up shortly since > it's relatively simple. > > ... and here's the patch, if

Re: kudu 1.4 kerberos

2017-10-24 Thread Todd Lipcon
Mon, Oct 16, 2017 at 2:29 PM, Matteo Durighetto < > m.durighe...@miriade.it> wrote: > > the "abcdefgh1234" it's an example of the the string created by the > cloudera manager during the enable kerberos. > > ... > > On Mon, Oct 16, 2017 at 11:57 PM, Todd

Re: [DISCUSS] Move Slack discussions to ASF official slack?

2017-10-23 Thread Todd Lipcon
kudu on the ASF slack in case we decide to go > forward with this. If we don't decide to go forward with it, it's a good > idea to hold onto the channel and pin a message in there about how to get > to the "official" Kudu slack. > > On Mon, Oct 23, 2017 at 3:00 PM,

Re: Kudu - Session.Configuration.FlushMode

2017-10-23 Thread Todd Lipcon
typically the best choice for a streaming ingest or bulk load scenario since it aims to manage buffer sizes for you automatically for best performance. We'll continue to invest on making AUTO_FLUSH_BACKGROUND work as well as possible for these scenarios. -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: [DISCUSS] Move Slack discussions to ASF official slack?

2017-10-23 Thread Todd Lipcon
he official ASF slack (http://the-asf.slack.com/ > ) > and migrate our discussions there. What does everyone think? > -- Todd Lipcon Software Engineer, Cloudera

Re: question about connect kudu with python on windows

2017-10-23 Thread Todd Lipcon
Cannot find installed kudu client. >> >> >> Command "python setup.py egg_info" failed with error code 1 in >> c:\users\rani\app >> data\local\temp\pip-build-7ildct\kudu-python >> >> -- Todd Lipcon Software Engineer, Cloudera

Re: kudu 1.4 kerberos

2017-10-16 Thread Todd Lipcon
apping and instead just use the "simple" mapping of using the short principal name? Generally we'd prefer to have as simple a configuration as possible but if your configuration is relatively commonplace it seems we might want an easier workaround than duplicating krb5.conf. -Todd >

Re: Service unavailable: Transaction failed, tablet 2758e5c68e974b92a3060db8575f3621 transaction memory consumption (67031036) has exceeded its limit (67108864) or the limit of an ancestral tracker

2017-10-12 Thread Todd Lipcon
uot; > c4ed5cb73f5644a8804d3abc976d02f8" member_type: VOTER last_known_addr { > host: "cloud-ocean-kudu-02" port: 7050 } } peers { permanent_uuid: " > 067e1e7245154f0fb2720dec6c77feec" member_type: VOTER last_known_addr { > host: "cloud-ocean-kudu-04" port: 7050 } } } (1 of 76249 similar) > > 2017-09-06 14:04 GMT+08:00 Lee King : > >> We got an error about :Service unavailable: Transaction failed, tablet >> 2758e5c68e974b92a3060db8575f3621 transaction memory consumption >> (67031036) has exceeded its limit (67108864) or the limit of an ancestral >> tracker.It looks like https://issues.apache.org/jira/browse/KUDU-1912. >> and the bug will be fix at 1.5,but out version is 1.4,Is there any affect >> for kudu stablity or data consistency? >> > > -- Todd Lipcon Software Engineer, Cloudera

Re: kudu 1.4 kerberos

2017-10-12 Thread Todd Lipcon
the kerberos configuration, but in typical configurations it's determined by the 'auth_to_local' configuration in your krb5.conf. See the corresponding section in the docs here: https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html My guess is that your host has been configured such that when the master maps its own principal, it's getting a different result than when it maps the principal being used by the tservers. Hope that gets you on the right track. Thanks -Todd -- Todd Lipcon Software Engineer, Cloudera

Re: Change Data Capture (CDC) with Kudu

2017-09-29 Thread Todd Lipcon
for synchronous writes to Kudu), but we would like to have some pretty good >>>> confidence that the secondary instance contains all the changes that the >>>> primary has up to say an hour before (or something like that). >>>> >>>> >>>> So far we considered a couple of options: >>>> - refreshing the seconday instance with a full copy of the primary one >>>> every so often, but that would mean having to transfer say 50TB of data >>>> between the two locations every time, and our network bandwidth constraints >>>> would prevent to do that even on a daily basis >>>> - having a column that contains the most recent time a row was updated, >>>> however this column couldn't be part of the primary key (because the >>>> primary key in Kudu is immutable), and therefore finding which rows have >>>> been changed every time would require a full scan of the table to be >>>> sync'd. It would also rely on the "last update timestamp" column to be >>>> always updated by the application (an assumption that we would like to >>>> avoid), and would need some other process to take into accounts the rows >>>> that are deleted. >>>> >>>> >>>> Since many of today's RDBMS (Oracle, MySQL, etc) allow for some sort of >>>> 'Change Data Capture' mechanism where only the 'deltas' are captured and >>>> applied to the secondary instance, we were wondering if there's any way in >>>> Kudu to achieve something like that (possibly mining the WALs, since my >>>> understanding is that each change gets applied to the WALs first). >>>> >>>> >>>> Thanks, >>>> Franco Venturi >>>> >>> >> > > -- Todd Lipcon Software Engineer, Cloudera

Re: Retrieving multiple records by composite primary key via java api

2017-09-25 Thread Todd Lipcon
ra,+CA+93101&entry=gmail&source=g> > > Overview <http://www.impactradius.com/?src=slsap> | Twitter > <https://twitter.com/impactradius> | Facebook > <https://www.facebook.com/pages/Impact-Radius/153376411365183> | LinkedIn > <https://www.linkedin.com/company/impact-radius-inc-> > -- Todd Lipcon Software Engineer, Cloudera

Re: Please tell me about License regarding kudu logo usage

2017-09-19 Thread Todd Lipcon
Oops, adding the original poster in case he or she is not subscribed to the list. On Sep 19, 2017 10:46 PM, "Todd Lipcon" wrote: > Hi Yuya, > > There should be no problem to use the Apache Kudu logo in your conference > slides, assuming you are just using as intended to de

  1   2   3   >