[jira] [Updated] (HIVE-27590) Make LINES TERMINATED BY work when creating table
[ https://issues.apache.org/jira/browse/HIVE-27590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lvhu updated HIVE-27590: Priority: Blocker (was: Major) > Make LINES TERMINATED BY work when creating table > - > > Key: HIVE-27590 > URL: https://issues.apache.org/jira/browse/HIVE-27590 > Project: Hive > Issue Type: Improvement > Components: Hive, SQL >Affects Versions: 3.1.3 > Environment: Any >Reporter: lvhu >Assignee: lvhu >Priority: Blocker > > *The only way to set line delimiters when creating tables in the current hive > is like this:* > {code:java} > package abc.hive.MyFstTextInputFormat > public class MyFstTextInputFormat extends FileInputFormat > implements JobConfigurable { > ... > } > create table test ( > id string, > name string > ) > INPUTFORMAT 'abc.hive.MyFstTextInputFormat' {code} > If there are multiple different record delimiters, multiple TextInputFormats > need to be rewritten. > Unluckily, The ideal method is not supported yet: > {code:java} > create table test ( > id string, > name string > ) > row format delimited fields terminated by '\t' -- supported > LINES TERMINATED BY '|@|' ; -- not supported {code} > I have a solution that supports setting line delimiters when creating tables > just like above. > *1.create a new HiveTextInputFormat class to replace TextInputFormatn class.* > HiveTextInputFormat class read file to support setting > record delimiter for input files based on the prefix of the file path. > {code:java} > public class HiveTextInputFormat extends FileInputFormat > implements JobConfigurable { > > public RecordReader getRecordReader( > InputSplit genericSplit, JobConf > job, > Reporter reporter) > throws IOException { > > reporter.setStatus(genericSplit.toString()); > // default delimiter > String delimiter = job.get("textinputformat.record.delimiter"); > //Obtain the path of the file > String filePath = genericSplit.getPath().toUri().getPath(); > //Obtain a list of file paths and delimiter relationships by parsing the > file > Map pathToDelimiterMap = parsePathToDelimite()//Obtain by parsing the > file > for(Map.Entry entry: pathToDelimiterMap.entrySet()){ > //config path > String configPath = entry.getKey(); > //if configPath is the prefix of filePath, set delimiter corresponding > to the file path > if(filePath.startsWith(configPath)) delimiter = entry.getValue(); > > } > byte[] recordDelimiterBytes = null; > if (null != delimiter) { > recordDelimiterBytes = delimiter.getBytes(Charsets.UTF_8); > } > return new LineRecordReader(job, (FileSplit) genericSplit, > recordDelimiterBytes); > } > } {code} > *2. modify hive create table class to support * > {code:java} > create table test ( > id string, > name string > ) > LINES TERMINATED BY '|@|' ; > LOCATION hdfs_path; {code} > If Users execute above SQL, hive will insert (hdfs_path,'|@|') to > file. > Set HiveTextInputFormat as default INPUTFORMAT . > Looking forward to receiving your suggestions and feedback! > *If you accept my idea, I hope you can assign the task to me. My Github > account is: _lvhu-goodluck_* > I really hope to contribute code to the community > > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27590) Make LINES TERMINATED BY work when creating table
[ https://issues.apache.org/jira/browse/HIVE-27590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lvhu updated HIVE-27590: Environment: Any (was: {code:java} //代码占位符 {code}) > Make LINES TERMINATED BY work when creating table > - > > Key: HIVE-27590 > URL: https://issues.apache.org/jira/browse/HIVE-27590 > Project: Hive > Issue Type: Improvement > Components: Hive, SQL >Affects Versions: 3.1.3 > Environment: Any >Reporter: lvhu >Assignee: lvhu >Priority: Major > > *The only way to set line delimiters when creating tables in the current hive > is like this:* > {code:java} > package abc.hive.MyFstTextInputFormat > public class MyFstTextInputFormat extends FileInputFormat > implements JobConfigurable { > ... > } > create table test ( > id string, > name string > ) > INPUTFORMAT 'abc.hive.MyFstTextInputFormat' {code} > If there are multiple different record delimiters, multiple TextInputFormats > need to be rewritten. > Unluckily, The ideal method is not supported yet: > {code:java} > create table test ( > id string, > name string > ) > row format delimited fields terminated by '\t' -- supported > LINES TERMINATED BY '|@|' ; -- not supported {code} > I have a solution that supports setting line delimiters when creating tables > just like above. > *1.create a new HiveTextInputFormat class to replace TextInputFormatn class.* > HiveTextInputFormat class read file to support setting > record delimiter for input files based on the prefix of the file path. > {code:java} > public class HiveTextInputFormat extends FileInputFormat > implements JobConfigurable { > > public RecordReader getRecordReader( > InputSplit genericSplit, JobConf > job, > Reporter reporter) > throws IOException { > > reporter.setStatus(genericSplit.toString()); > // default delimiter > String delimiter = job.get("textinputformat.record.delimiter"); > //Obtain the path of the file > String filePath = genericSplit.getPath().toUri().getPath(); > //Obtain a list of file paths and delimiter relationships by parsing the > file > Map pathToDelimiterMap = parsePathToDelimite()//Obtain by parsing the > file > for(Map.Entry entry: pathToDelimiterMap.entrySet()){ > //config path > String configPath = entry.getKey(); > //if configPath is the prefix of filePath, set delimiter corresponding > to the file path > if(filePath.startsWith(configPath)) delimiter = entry.getValue(); > > } > byte[] recordDelimiterBytes = null; > if (null != delimiter) { > recordDelimiterBytes = delimiter.getBytes(Charsets.UTF_8); > } > return new LineRecordReader(job, (FileSplit) genericSplit, > recordDelimiterBytes); > } > } {code} > *2. modify hive create table class to support * > {code:java} > create table test ( > id string, > name string > ) > LINES TERMINATED BY '|@|' ; > LOCATION hdfs_path; {code} > If Users execute above SQL, hive will insert (hdfs_path,'|@|') to > file. > Set HiveTextInputFormat as default INPUTFORMAT . > Looking forward to receiving your suggestions and feedback! > *If you accept my idea, I hope you can assign the task to me. My Github > account is: _lvhu-goodluck_* > I really hope to contribute code to the community > > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-27590) Make LINES TERMINATED BY work when creating table
[ https://issues.apache.org/jira/browse/HIVE-27590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lvhu updated HIVE-27590: Description: *The only way to set line delimiters when creating tables in the current hive is like this:* {code:java} package abc.hive.MyFstTextInputFormat public class MyFstTextInputFormat extends FileInputFormat implements JobConfigurable { ... } create table test ( id string, name string ) INPUTFORMAT 'abc.hive.MyFstTextInputFormat' {code} If there are multiple different record delimiters, multiple TextInputFormats need to be rewritten. Unluckily, The ideal method is not supported yet: {code:java} create table test ( id string, name string ) row format delimited fields terminated by '\t' -- supported LINES TERMINATED BY '|@|' ; -- not supported {code} I have a solution that supports setting line delimiters when creating tables just like above. *1.create a new HiveTextInputFormat class to replace TextInputFormatn class.* HiveTextInputFormat class read file to support setting record delimiter for input files based on the prefix of the file path. {code:java} public class HiveTextInputFormat extends FileInputFormat implements JobConfigurable { public RecordReader getRecordReader( InputSplit genericSplit, JobConf job, Reporter reporter) throws IOException { reporter.setStatus(genericSplit.toString()); // default delimiter String delimiter = job.get("textinputformat.record.delimiter"); //Obtain the path of the file String filePath = genericSplit.getPath().toUri().getPath(); //Obtain a list of file paths and delimiter relationships by parsing the file Map pathToDelimiterMap = parsePathToDelimite()//Obtain by parsing the file for(Map.Entry entry: pathToDelimiterMap.entrySet()){ //config path String configPath = entry.getKey(); //if configPath is the prefix of filePath, set delimiter corresponding to the file path if(filePath.startsWith(configPath)) delimiter = entry.getValue(); } byte[] recordDelimiterBytes = null; if (null != delimiter) { recordDelimiterBytes = delimiter.getBytes(Charsets.UTF_8); } return new LineRecordReader(job, (FileSplit) genericSplit, recordDelimiterBytes); } } {code} *2. modify hive create table class to support * {code:java} create table test ( id string, name string ) LINES TERMINATED BY '|@|' ; LOCATION hdfs_path; {code} If Users execute above SQL, hive will insert (hdfs_path,'|@|') to file. Set HiveTextInputFormat as default INPUTFORMAT . Looking forward to receiving your suggestions and feedback! *If you accept my idea, I hope you can assign the task to me. My Github account is: _lvhu-goodluck_* I really hope to contribute code to the community was: *The only way to set line delimiters when creating tables in the current hive is like this:* {code:java} package abc.hive.MyFstTextInputFormat public class MyFstTextInputFormat extends FileInputFormat implements JobConfigurable { ... } create table test ( id string, name string ) INPUTFORMAT 'abc.hive.MyFstTextInputFormat' {code} If there are multiple different record delimiters, multiple TextInputFormats need to be rewritten. Unluckily, The ideal method is not supported yet: {code:java} create table test ( id string, name string ) row format delimited fields terminated by '\t' -- supported LINES TERMINATED BY '|@|' ; -- not supported {code} I have a solution that supports setting line delimiters when creating tables just like above. *1. create a new HiveTextInputFormat class to replace TextInputFormatn class.* HiveTextInputFormat class read file to support setting record delimiter for input files based on the prefix of the file path. {code:java} public class HiveTextInputFormat extends FileInputFormat implements JobConfigurable { public RecordReader getRecordReader( InputSplit genericSplit, JobConf job, Reporter reporter) throws IOException { reporter.setStatus(genericSplit.toString()); // default delimiter String delimiter = job.get("textinputformat.record.delimiter"); //Obtain the path of the file String filePath = genericSplit.getPath().toUri().getPath(); //Obtain a list of file paths and delimiter relationships by parsing the file Map pathToDelimiterMap = parsePathToDelimite()//Obtain by parsing the file for(Map.Entry entry: pathToDelimiterMap.entrySet()){ //config path String configPath = entry.getKey(); //if configPath is the prefix of filePath, set delimiter corresponding to the file path if(filePath.startsWith(configPath))