[jira] [Updated] (HIVE-27590) Make LINES TERMINATED BY work when creating table

2023-08-10 Thread lvhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lvhu updated HIVE-27590:

Priority: Blocker  (was: Major)

> Make LINES TERMINATED BY work when creating table
> -
>
> Key: HIVE-27590
> URL: https://issues.apache.org/jira/browse/HIVE-27590
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, SQL
>Affects Versions: 3.1.3
> Environment: Any
>Reporter: lvhu
>Assignee: lvhu
>Priority: Blocker
>
> *The only way to set line delimiters when creating tables in the current hive 
> is like this:*
> {code:java}
> package abc.hive.MyFstTextInputFormat
> public class MyFstTextInputFormat extends FileInputFormat 
> implements JobConfigurable {
>  ...
> }
> create table test  (  
>     id string,  
>     name string  
> )  
> INPUTFORMAT 'abc.hive.MyFstTextInputFormat'   {code}
> If there are multiple different record delimiters, multiple TextInputFormats 
> need to be rewritten.
> Unluckily, The ideal method is not supported yet:
> {code:java}
> create table test  (  
>     id string,  
>     name string  
> )  
> row format delimited fields terminated by '\t'  -- supported
> LINES TERMINATED BY '|@|' ;   -- not supported  {code}
> I have a solution that supports setting line delimiters when creating tables 
> just like above.
> *1.create a new HiveTextInputFormat class to replace TextInputFormatn class.*
> HiveTextInputFormat class read  file to support setting 
> record delimiter for input files based on the prefix of the file path.
> {code:java}
> public class HiveTextInputFormat extends FileInputFormat
>   implements JobConfigurable {
>   
>   public RecordReader getRecordReader(
>                                           InputSplit genericSplit, JobConf 
> job,
>                                           Reporter reporter)
>     throws IOException {
>     
>     reporter.setStatus(genericSplit.toString());
>     // default delimiter
>     String delimiter = job.get("textinputformat.record.delimiter");
>     //Obtain the path of the file
>     String filePath = genericSplit.getPath().toUri().getPath();
>     //Obtain a list of file paths and delimiter relationships by parsing the 
>  file
>     Map pathToDelimiterMap = parsePathToDelimite()//Obtain by parsing the 
>  file
>     for(Map.Entry entry: pathToDelimiterMap.entrySet()){
>      //config path
>      String configPath = entry.getKey();   
>      //if configPath is the prefix of filePath, set delimiter corresponding 
> to the file path
>      if(filePath.startsWith(configPath))  delimiter = entry.getValue();       
>  
>     }
>     byte[] recordDelimiterBytes = null;
>     if (null != delimiter) {
>       recordDelimiterBytes = delimiter.getBytes(Charsets.UTF_8);
>     }
>     return new LineRecordReader(job, (FileSplit) genericSplit,
>         recordDelimiterBytes);
>   }
> } {code}
> *2. modify hive create table class to support *
> {code:java}
> create table test  (  
>     id string,  
>     name string  
> )  
> LINES TERMINATED BY '|@|' ;  
> LOCATION  hdfs_path; {code}
> If Users execute above SQL, hive will insert  (hdfs_path,'|@|')  to 
>  file.
> Set HiveTextInputFormat  as default INPUTFORMAT  .
> Looking forward to receiving your suggestions and feedback!
> *If you accept my idea, I hope you can assign the task to me. My Github 
> account is: _lvhu-goodluck_*
> I really hope to contribute code to the community
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27590) Make LINES TERMINATED BY work when creating table

2023-08-10 Thread lvhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lvhu updated HIVE-27590:

Environment: Any  (was: {code:java}
//代码占位符
{code})

> Make LINES TERMINATED BY work when creating table
> -
>
> Key: HIVE-27590
> URL: https://issues.apache.org/jira/browse/HIVE-27590
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive, SQL
>Affects Versions: 3.1.3
> Environment: Any
>Reporter: lvhu
>Assignee: lvhu
>Priority: Major
>
> *The only way to set line delimiters when creating tables in the current hive 
> is like this:*
> {code:java}
> package abc.hive.MyFstTextInputFormat
> public class MyFstTextInputFormat extends FileInputFormat 
> implements JobConfigurable {
>  ...
> }
> create table test  (  
>     id string,  
>     name string  
> )  
> INPUTFORMAT 'abc.hive.MyFstTextInputFormat'   {code}
> If there are multiple different record delimiters, multiple TextInputFormats 
> need to be rewritten.
> Unluckily, The ideal method is not supported yet:
> {code:java}
> create table test  (  
>     id string,  
>     name string  
> )  
> row format delimited fields terminated by '\t'  -- supported
> LINES TERMINATED BY '|@|' ;   -- not supported  {code}
> I have a solution that supports setting line delimiters when creating tables 
> just like above.
> *1.create a new HiveTextInputFormat class to replace TextInputFormatn class.*
> HiveTextInputFormat class read  file to support setting 
> record delimiter for input files based on the prefix of the file path.
> {code:java}
> public class HiveTextInputFormat extends FileInputFormat
>   implements JobConfigurable {
>   
>   public RecordReader getRecordReader(
>                                           InputSplit genericSplit, JobConf 
> job,
>                                           Reporter reporter)
>     throws IOException {
>     
>     reporter.setStatus(genericSplit.toString());
>     // default delimiter
>     String delimiter = job.get("textinputformat.record.delimiter");
>     //Obtain the path of the file
>     String filePath = genericSplit.getPath().toUri().getPath();
>     //Obtain a list of file paths and delimiter relationships by parsing the 
>  file
>     Map pathToDelimiterMap = parsePathToDelimite()//Obtain by parsing the 
>  file
>     for(Map.Entry entry: pathToDelimiterMap.entrySet()){
>      //config path
>      String configPath = entry.getKey();   
>      //if configPath is the prefix of filePath, set delimiter corresponding 
> to the file path
>      if(filePath.startsWith(configPath))  delimiter = entry.getValue();       
>  
>     }
>     byte[] recordDelimiterBytes = null;
>     if (null != delimiter) {
>       recordDelimiterBytes = delimiter.getBytes(Charsets.UTF_8);
>     }
>     return new LineRecordReader(job, (FileSplit) genericSplit,
>         recordDelimiterBytes);
>   }
> } {code}
> *2. modify hive create table class to support *
> {code:java}
> create table test  (  
>     id string,  
>     name string  
> )  
> LINES TERMINATED BY '|@|' ;  
> LOCATION  hdfs_path; {code}
> If Users execute above SQL, hive will insert  (hdfs_path,'|@|')  to 
>  file.
> Set HiveTextInputFormat  as default INPUTFORMAT  .
> Looking forward to receiving your suggestions and feedback!
> *If you accept my idea, I hope you can assign the task to me. My Github 
> account is: _lvhu-goodluck_*
> I really hope to contribute code to the community
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27590) Make LINES TERMINATED BY work when creating table

2023-08-10 Thread lvhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lvhu updated HIVE-27590:

Description: 
*The only way to set line delimiters when creating tables in the current hive 
is like this:*
{code:java}
package abc.hive.MyFstTextInputFormat
public class MyFstTextInputFormat extends FileInputFormat 
implements JobConfigurable {
 ...
}
create table test  (  
    id string,  
    name string  
)  
INPUTFORMAT 'abc.hive.MyFstTextInputFormat'   {code}
If there are multiple different record delimiters, multiple TextInputFormats 
need to be rewritten.

Unluckily, The ideal method is not supported yet:
{code:java}
create table test  (  
    id string,  
    name string  
)  
row format delimited fields terminated by '\t'  -- supported
LINES TERMINATED BY '|@|' ;   -- not supported  {code}
I have a solution that supports setting line delimiters when creating tables 
just like above.

*1.create a new HiveTextInputFormat class to replace TextInputFormatn class.*

HiveTextInputFormat class read  file to support setting record 
delimiter for input files based on the prefix of the file path.
{code:java}
public class HiveTextInputFormat extends FileInputFormat
  implements JobConfigurable {
  
  public RecordReader getRecordReader(
                                          InputSplit genericSplit, JobConf job,
                                          Reporter reporter)
    throws IOException {
    
    reporter.setStatus(genericSplit.toString());
    // default delimiter
    String delimiter = job.get("textinputformat.record.delimiter");
    //Obtain the path of the file
    String filePath = genericSplit.getPath().toUri().getPath();
    //Obtain a list of file paths and delimiter relationships by parsing the 
 file
    Map pathToDelimiterMap = parsePathToDelimite()//Obtain by parsing the 
 file
    for(Map.Entry entry: pathToDelimiterMap.entrySet()){
     //config path
     String configPath = entry.getKey();   
     //if configPath is the prefix of filePath, set delimiter corresponding to 
the file path
     if(filePath.startsWith(configPath))  delimiter = entry.getValue();        
    }
    byte[] recordDelimiterBytes = null;
    if (null != delimiter) {
      recordDelimiterBytes = delimiter.getBytes(Charsets.UTF_8);
    }
    return new LineRecordReader(job, (FileSplit) genericSplit,
        recordDelimiterBytes);
  }
} {code}
*2. modify hive create table class to support *
{code:java}
create table test  (  
    id string,  
    name string  
)  
LINES TERMINATED BY '|@|' ;  
LOCATION  hdfs_path; {code}
If Users execute above SQL, hive will insert  (hdfs_path,'|@|')  to 
 file.

Set HiveTextInputFormat  as default INPUTFORMAT  .

Looking forward to receiving your suggestions and feedback!

*If you accept my idea, I hope you can assign the task to me. My Github account 
is: _lvhu-goodluck_*

I really hope to contribute code to the community

 

 

 

 

 

 

  was:
*The only way to set line delimiters when creating tables in the current hive 
is like this:*
{code:java}
package abc.hive.MyFstTextInputFormat
public class MyFstTextInputFormat extends FileInputFormat 
implements JobConfigurable {
 ...
}
create table test  (  
    id string,  
    name string  
)  
INPUTFORMAT 'abc.hive.MyFstTextInputFormat'   {code}
If there are multiple different record delimiters, multiple TextInputFormats 
need to be rewritten.

Unluckily, The ideal method is not supported yet:
{code:java}
create table test  (  
    id string,  
    name string  
)  
row format delimited fields terminated by '\t'  -- supported
LINES TERMINATED BY '|@|' ;   -- not supported  {code}
I have a solution that supports setting line delimiters when creating tables 
just like above.

*1. create a new HiveTextInputFormat class to replace TextInputFormatn class.* 
HiveTextInputFormat class read  file to support setting record 
delimiter for input files based on the prefix of the file path.
{code:java}
public class HiveTextInputFormat extends FileInputFormat
  implements JobConfigurable {
  
  public RecordReader getRecordReader(
                                          InputSplit genericSplit, JobConf job,
                                          Reporter reporter)
    throws IOException {
    
    reporter.setStatus(genericSplit.toString());
    // default delimiter
    String delimiter = job.get("textinputformat.record.delimiter");
    //Obtain the path of the file
    String filePath = genericSplit.getPath().toUri().getPath();
    //Obtain a list of file paths and delimiter relationships by parsing the 
 file
    Map pathToDelimiterMap = parsePathToDelimite()//Obtain by parsing the 
 file
    for(Map.Entry entry: pathToDelimiterMap.entrySet()){
     //config path
     String configPath = entry.getKey();   
     //if configPath is the prefix of filePath, set delimiter corresponding to 
the file path
     if(filePath.startsWith(configPath))