[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.
[ https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057258#comment-14057258 ] James Taylor commented on PHOENIX-1056: --- Good point, [~jaywong]. A ImportTsv tool for phoenix to build table data and all index data. Key: PHOENIX-1056 URL: https://issues.apache.org/jira/browse/PHOENIX-1056 Project: Phoenix Issue Type: Task Affects Versions: 3.0.0 Reporter: jay wong Fix For: 3.1 Attachments: PHOENIX-1056.patch I have just build a tool for build table data and index table data just like ImportTsv job. http://hbase.apache.org/book/ops_mgt.html#importtsv when ImportTsv work it write HFile in a CF name path. for example A table has two cf, A and B. the output is ./outputpath/A ./outputpath/B In my job. we has a table. TableOne. and two Index IdxOne, IdxTwo. the output will be ./outputpath/TableOne/A ./outputpath/TableOne/B ./outputpath/IdxOne ./outputpath/IdxTwo. If anyone need it .I will build a clean tool. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.
[ https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055974#comment-14055974 ] James Taylor commented on PHOENIX-1056: --- Thanks, [~jaywong]. That's a good improvement to build both the table data and the index data in a single job. Open issues are: - Do we need both a CSV bulk loader and an ImportTsv tool? How are they different? Or can the improvements you made be folded into the CSV bulk loader instead? If we do need both, can the ImportTsv tool be built on top of the CSV bulk loader? - The CSV bulk loader uses publicly exposed Phoenix APIs to get at the underlying KeyValues and uses the Phoenix table metadata to drive the import, while the ImportTSV tool requires the column information to be passed through in a somewhat awkward manner (leaving room for discrepancies between the real schema and the one passed in). The ImportTSV should go through the same Phoenix APIs as the CSV bulk loader IMO. Thoughts? Would be interested in your opinions, [~gabriel.reid] and [~maghamravikiran] A ImportTsv tool for phoenix to build table data and all index data. Key: PHOENIX-1056 URL: https://issues.apache.org/jira/browse/PHOENIX-1056 Project: Phoenix Issue Type: Task Affects Versions: 3.0.0 Reporter: jay wong Fix For: 3.1 Attachments: PHOENIX-1056.patch I have just build a tool for build table data and index table data just like ImportTsv job. http://hbase.apache.org/book/ops_mgt.html#importtsv when ImportTsv work it write HFile in a CF name path. for example A table has two cf, A and B. the output is ./outputpath/A ./outputpath/B In my job. we has a table. TableOne. and two Index IdxOne, IdxTwo. the output will be ./outputpath/TableOne/A ./outputpath/TableOne/B ./outputpath/IdxOne ./outputpath/IdxTwo. If anyone need it .I will build a clean tool. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.
[ https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056538#comment-14056538 ] Jeffrey Zhong commented on PHOENIX-1056: Other issues can be easily addressed except the index hfile region boundary alignment during MR otherwise LoadIncrementalHFiles will become a heavy operation. [~jaywong] Have you tried ImportTsv tool internally so that you might be able to see what's the performance difference between one single MR(plus loading hfiles) and multiple MR concurrently? A ImportTsv tool for phoenix to build table data and all index data. Key: PHOENIX-1056 URL: https://issues.apache.org/jira/browse/PHOENIX-1056 Project: Phoenix Issue Type: Task Affects Versions: 3.0.0 Reporter: jay wong Fix For: 3.1 Attachments: PHOENIX-1056.patch I have just build a tool for build table data and index table data just like ImportTsv job. http://hbase.apache.org/book/ops_mgt.html#importtsv when ImportTsv work it write HFile in a CF name path. for example A table has two cf, A and B. the output is ./outputpath/A ./outputpath/B In my job. we has a table. TableOne. and two Index IdxOne, IdxTwo. the output will be ./outputpath/TableOne/A ./outputpath/TableOne/B ./outputpath/IdxOne ./outputpath/IdxTwo. If anyone need it .I will build a clean tool. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.
[ https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057040#comment-14057040 ] Jeffrey Zhong commented on PHOENIX-1056: {quote} Sometimes build all the data in a single MR is not only for performance. but also the data consistency. {quote} But for bulk loading, I think we can safely make the assumption that input data won't change, no? A ImportTsv tool for phoenix to build table data and all index data. Key: PHOENIX-1056 URL: https://issues.apache.org/jira/browse/PHOENIX-1056 Project: Phoenix Issue Type: Task Affects Versions: 3.0.0 Reporter: jay wong Fix For: 3.1 Attachments: PHOENIX-1056.patch I have just build a tool for build table data and index table data just like ImportTsv job. http://hbase.apache.org/book/ops_mgt.html#importtsv when ImportTsv work it write HFile in a CF name path. for example A table has two cf, A and B. the output is ./outputpath/A ./outputpath/B In my job. we has a table. TableOne. and two Index IdxOne, IdxTwo. the output will be ./outputpath/TableOne/A ./outputpath/TableOne/B ./outputpath/IdxOne ./outputpath/IdxTwo. If anyone need it .I will build a clean tool. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.
[ https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055150#comment-14055150 ] James Taylor commented on PHOENIX-1056: --- What functionality does this add above and beyond our Bulk CSV loader? http://phoenix.apache.org/bulk_dataload.html One thing may be the ability to import both data and indexes? It'd be a nice addition to the Bulk CSV loader to bulk load indexes together with their data. A ImportTsv tool for phoenix to build table data and all index data. Key: PHOENIX-1056 URL: https://issues.apache.org/jira/browse/PHOENIX-1056 Project: Phoenix Issue Type: Task Affects Versions: 3.0.0 Reporter: jay wong Fix For: 3.1 Attachments: PHOENIX-1056.patch I have just build a tool for build table data and index table data just like ImportTsv job. http://hbase.apache.org/book/ops_mgt.html#importtsv when ImportTsv work it write HFile in a CF name path. for example A table has two cf, A and B. the output is ./outputpath/A ./outputpath/B In my job. we has a table. TableOne. and two Index IdxOne, IdxTwo. the output will be ./outputpath/TableOne/A ./outputpath/TableOne/B ./outputpath/IdxOne ./outputpath/IdxTwo. If anyone need it .I will build a clean tool. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.
[ https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055387#comment-14055387 ] Jeffrey Zhong commented on PHOENIX-1056: Oh, I'm late to see the JIRA. I had a different patch to load index table data in one go by submitting multiple MR jobs to load data concurrently for CsvBulkLoadTool. [~jaywong] approach is using one MR to load data index data in one single map reduce job. I checked the patch and the underlying idea is very good. But it has one issue is that the partitioning is on primary table. Therefore, the index table hfiles aren't align with its own partitioning and when loading those generated index hfiles will incur extra writes during loading. Let me firstly create a separate JIRA to improve CsvBulkLoadTool to build indexes during loading time and later we can decide if to migrate CsvBulkLoadTool to use current JIRA's custom mapper, reducer and MultiHFileOutputFormat. A ImportTsv tool for phoenix to build table data and all index data. Key: PHOENIX-1056 URL: https://issues.apache.org/jira/browse/PHOENIX-1056 Project: Phoenix Issue Type: Task Affects Versions: 3.0.0 Reporter: jay wong Fix For: 3.1 Attachments: PHOENIX-1056.patch I have just build a tool for build table data and index table data just like ImportTsv job. http://hbase.apache.org/book/ops_mgt.html#importtsv when ImportTsv work it write HFile in a CF name path. for example A table has two cf, A and B. the output is ./outputpath/A ./outputpath/B In my job. we has a table. TableOne. and two Index IdxOne, IdxTwo. the output will be ./outputpath/TableOne/A ./outputpath/TableOne/B ./outputpath/IdxOne ./outputpath/IdxTwo. If anyone need it .I will build a clean tool. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.
[ https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055809#comment-14055809 ] jay wong commented on PHOENIX-1056: --- [~jamestaylor] yes . It create both table data and indexes data (HFile) in a single job. The patch is a Alpha version. I build it for a preview. Finally it's will be a part of CsvImportTsv. A ImportTsv tool for phoenix to build table data and all index data. Key: PHOENIX-1056 URL: https://issues.apache.org/jira/browse/PHOENIX-1056 Project: Phoenix Issue Type: Task Affects Versions: 3.0.0 Reporter: jay wong Fix For: 3.1 Attachments: PHOENIX-1056.patch I have just build a tool for build table data and index table data just like ImportTsv job. http://hbase.apache.org/book/ops_mgt.html#importtsv when ImportTsv work it write HFile in a CF name path. for example A table has two cf, A and B. the output is ./outputpath/A ./outputpath/B In my job. we has a table. TableOne. and two Index IdxOne, IdxTwo. the output will be ./outputpath/TableOne/A ./outputpath/TableOne/B ./outputpath/IdxOne ./outputpath/IdxTwo. If anyone need it .I will build a clean tool. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.
[ https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053505#comment-14053505 ] jay wong commented on PHOENIX-1056: --- nomarl HBase ImportTsv is : bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:a,cf:b,cf:c -Dimporttsv.bulk.output=hdfs://storefile-outputdir tablename hdfs-inputdir Phoenix ImportTsv is this. and it support phoenix datatype bin/hbase org.apache.hadoop.hbase.mapreduce.PhoneixImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,CF:A:PH_INT,CF:B:PH:BIGINT,cf:c -Dimporttsv.index.all=true -Dimporttsv.bulk.output=hdfs://storefile-outputdir tablename hdfs-inputdir If the primary key is mutil-col. support the rule replace HBASE_ROW_KEY to HBASE_ROW_KEY^CF1:Q1:PH_INT^CF2:Q2^0^CF1:Q3:PH_INT parameter: -Dimporttsv.index.all=true. If build all index table data, default is false -Dimporttsv.build.table=true if build the data table, default is true. -Dimporttsv.index.names=INDEX1,INDEX2. which index table we build. need set -Dimporttsv.index.all=false. A ImportTsv tool for phoenix to build table data and all index data. Key: PHOENIX-1056 URL: https://issues.apache.org/jira/browse/PHOENIX-1056 Project: Phoenix Issue Type: Task Affects Versions: 3.0.0 Reporter: jay wong Fix For: 3.1 Attachments: PHOENIX-1056.patch I have just build a tool for build table data and index table data just like ImportTsv job. http://hbase.apache.org/book/ops_mgt.html#importtsv when ImportTsv work it write HFile in a CF name path. for example A table has two cf, A and B. the output is ./outputpath/A ./outputpath/B In my job. we has a table. TableOne. and two Index IdxOne, IdxTwo. the output will be ./outputpath/TableOne/A ./outputpath/TableOne/B ./outputpath/IdxOne ./outputpath/IdxTwo. If anyone need it .I will build a clean tool. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.
[ https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053507#comment-14053507 ] jay wong commented on PHOENIX-1056: --- Anyway. In my ImportTsv It support a char or unicode separator, which in the apache code only support a single-byte separators just like -Dimporttsv.separator=\001 -Dimporttsv.separator=\u0019 A ImportTsv tool for phoenix to build table data and all index data. Key: PHOENIX-1056 URL: https://issues.apache.org/jira/browse/PHOENIX-1056 Project: Phoenix Issue Type: Task Affects Versions: 3.0.0 Reporter: jay wong Fix For: 3.1 Attachments: PHOENIX-1056.patch I have just build a tool for build table data and index table data just like ImportTsv job. http://hbase.apache.org/book/ops_mgt.html#importtsv when ImportTsv work it write HFile in a CF name path. for example A table has two cf, A and B. the output is ./outputpath/A ./outputpath/B In my job. we has a table. TableOne. and two Index IdxOne, IdxTwo. the output will be ./outputpath/TableOne/A ./outputpath/TableOne/B ./outputpath/IdxOne ./outputpath/IdxTwo. If anyone need it .I will build a clean tool. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.
[ https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14052268#comment-14052268 ] James Taylor commented on PHOENIX-1056: --- Fantastic work, [~jaywong]! Would love to get this into Phoenix. Can you send us a pull request so folks can review it? A ImportTsv tool for phoenix to build table data and all index data. Key: PHOENIX-1056 URL: https://issues.apache.org/jira/browse/PHOENIX-1056 Project: Phoenix Issue Type: Task Affects Versions: 3.0.0 Reporter: jay wong Fix For: 3.1 I have just build a tool for build table data and index table data just like ImportTsv job. http://hbase.apache.org/book/ops_mgt.html#importtsv when ImportTsv work it write HFile in a CF name path. for example A table has two cf, A and B. the output is ./outputpath/A ./outputpath/B In my job. we has a table. TableOne. and two Index IdxOne, IdxTwo. the output will be ./outputpath/TableOne/A ./outputpath/TableOne/B ./outputpath/IdxOne ./outputpath/IdxTwo. If anyone need it .I will build a clean tool. -- This message was sent by Atlassian JIRA (v6.2#6252)