Re: How to rebalance a table without converting to dataset

2018-04-17 Thread Shuyi Chen
Hi Darshan, thanks for raising the problem. We do have similar use of
rebalancing in Flink SQL, where we want to rebalance the Kafka input with
more partitions to increase parallelism in streaming.

As Fabian suggests, rebalancing is not relation algebra. The closest use of
the operation I can find in databases is in vertica (REBALANCE_TABLE
),
but it's used more as a one-time rebalance operation of the data after
adding/removing nodes. However, on the data processing side, I think this
might deserve more attention because we can't easily modify the input data
source, e.g. number of Kafka partitions.

The closest I can think of to enable this feature is DDL in SQL, e.g.
something like ALTER TABLE REBALANCE in SQL. With this DDL statement, it
will cause a rebalance() call when StreamTableSource.getDataStream
or BatchTableSource.getDataSet is invoked. In such case, we dont need to
touch the parser or planner in Calcite. For the table API, a possible
solution would be to add a rebalance() api in table API, but will need to
close out the LogicalPlan every time rebalance() is called, so we won't
need to touch the Calcite planner.

On Mon, Apr 16, 2018 at 5:41 AM, Fabian Hueske  wrote:

> Hi Darshan,
>
> You are right. there's currently no rebalancing operation on the Table
> API.
> I see that this might be a good feature, not sure though how easy it would
> be to integrate because we need to pass it through the Calcite optimizer
> and rebalancing is not a relational operation.
>
> For now, converting to DataSet and back to Table is the only option.
>
> Best, Fabian
>
> 2018-04-13 14:33 GMT+02:00 Darshan Singh :
>
>> Hi
>>
>> I have a table and I want to rebalance the data so that each partition is
>> equal. I cna convert to dataset and rebalance and then convert to table.
>>
>> I couldnt find any rebalance on table api. Does anyone know any better
>> idea to rebalance table data?
>>
>> Thanks
>>
>
>


-- 
"So you have to trust that the dots will somehow connect in your future."


Re: How to rebalance a table without converting to dataset

2018-04-16 Thread Fabian Hueske
Hi Darshan,

You are right. there's currently no rebalancing operation on the Table API.
I see that this might be a good feature, not sure though how easy it would
be to integrate because we need to pass it through the Calcite optimizer
and rebalancing is not a relational operation.

For now, converting to DataSet and back to Table is the only option.

Best, Fabian

2018-04-13 14:33 GMT+02:00 Darshan Singh :

> Hi
>
> I have a table and I want to rebalance the data so that each partition is
> equal. I cna convert to dataset and rebalance and then convert to table.
>
> I couldnt find any rebalance on table api. Does anyone know any better
> idea to rebalance table data?
>
> Thanks
>


How to rebalance a table without converting to dataset

2018-04-13 Thread Darshan Singh
Hi

I have a table and I want to rebalance the data so that each partition is
equal. I cna convert to dataset and rebalance and then convert to table.

I couldnt find any rebalance on table api. Does anyone know any better idea
to rebalance table data?

Thanks