Re: DDL for CarbonData table backup and recovery (new feature)

2017-11-26 Thread Mohammad Shahid Khan
Hi Ravindra & Likun
I am freezing the design and going to start the code.
Please revert me if any issues.

--Regards,
   Shahid

On Fri, Nov 24, 2017 at 12:40 PM, mohdshahidkhan <
mohdshahidkhan1...@gmail.com> wrote:

> *Please update solution:
> Instead of passing the dbLocation, the database name will be passed in the
> Register DDL*
> CarbonData table backup and recovery
> Background
> Customer has created one CarbonData table which is already loaded very huge
> data, and now they install another cluster which want to use the same data
> as this table and don’t want load again, because load data cost long time,
> so they want can directly backup this table data and recover it in another
> cluster. After recovery the data in the CarbonData user can use it as a
> normal CarbonData table.
> Requirement Description
> A CarbonData table’s data can support backup the data and recover the data
> which no need load data again.
> To reuse the CarbonData table of another cluster a DDL should be provided
> to
> create the CarbonData table from the existing carbon table schema.
> Solution
> Currently CarbonData has below three types of tables
> 1.   Normal table
> 2.   Pre Aggregate table
> CarbonData should provide a DDL command to create the table from existing
> table data.
> Below DDL command could be used to create the table from existing table
> data.
>
>   REGISTER TABLES FOR $DBName;
>
>i.  The database path will be retrived from hive catalog &
>The database path will be scanned to get all table.
>   ii.  The table schema will be read to get columns details.
>  iii.  The table will be registered to the hive catalog with below
> details
> CREATE TABLE $tbName USING carbondata OPTIONS (tableName "$dbName.$tbName",
> dbName "$dbName",
> tablePath "$tablePath",
> path "$tablePath" )
>
> Precondition:
> i. The user has to create the database and Before executing this
> command
> the old table schema and
>data should be copied into the database location.
>ii. If the table is aggregate table then all the aggregate tables should
> be copied to the  in database
>location .
>
> Validation:
>1. If database does not exist then the registration will fail.
>2. The table will be registered only if same table name is not already
> registered.
>3. If the table contains the aggregate tables then all the aggregate
> tables should be registered to hive
>catalog and if any of the aggregate table does not exist then the
> table creation operation should fail.
>
>
>
> --
> Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.
> n5.nabble.com/
>


Re: DDL for CarbonData table backup and recovery (new feature)

2017-11-26 Thread mohdshahidkhan
Thanks for the clarification Naresh.
Please find my answer.

Actually if the export command is on CarbonData table, we can just zip the 
actual table folder & associated agg table folders into user mentioned 
location. It dont export Metadata 
Copy data from 1 cluster to other will still remain same in your approach 
also. 
Agree, we don't want the export data, its simply user has the tables from
the previous cluster 
and want to use them, so to use that he has register with the hive.

After copying data into new cluster, how to synchronize incremental loads 
or schema evolution from old cluster to new cluster ? 
should we need to drop the table in new cluster, copy the data from old 
cluster to new cluster & recreate table again ? 
A. synch from old to new is not is scope.

I think creating carbondata table requires schema information also to be 
passed. 
CREATE TABLE $dbName.$tbName (${ fields.map(f => f.rawSchema).mkString(",") 
}) USING CARBONDATA OPTIONS (tableName "$tbName", dbName "$dbName", 
tablePath "$tablePath") 
A. agree will take the same.




--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Blog on how to use Carbondata with Presto

2017-11-26 Thread Liang Chen
Hi bhavya

Thanks for your sharing, a nice blog.

Regards
Liang


bhavya411 wrote
> Hi All,
> 
> Please look at the blog to see how we can use CarbonData with Presto.
> 
> 
> https://blog.knoldus.com/2017/11/20/integrating-presto-with-carbondata/
> https://blog.knoldus.com/2017/11/20/integrating-presto-with-carbondata/;
> 
> 
> 
> Thanks and regards
> Bhavya





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/