Re: Review Request 61522: SQOOP-2907 : Export parquet files to RDBMS: don't require .metadata for parquet files

2017-10-04 Thread Sandish Kumar HN


> On Oct. 2, 2017, 3:18 p.m., Zoltán Tóth wrote:
> > src/java/org/apache/sqoop/avro/AvroUtil.java
> > Lines 194 (patched)
> > 
> >
> > If it is only Parquet file related then what is this change in 
> > AvroUtils. I haven't checked why it is necessary so please explain why does 
> > this change needed.

bytes field in parquet throws that it can not cast BytesWriteble to String. so 
converting from ByteWriteble to string with UTF-8


- Sandish Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61522/#review186844
---


On Oct. 4, 2017, 7:22 a.m., Sandish Kumar HN wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61522/
> ---
> 
> (Updated Oct. 4, 2017, 7:22 a.m.)
> 
> 
> Review request for Sqoop and Anna Szonyi.
> 
> 
> Bugs: SQOOP-2907
> https://issues.apache.org/jira/browse/SQOOP-2907
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> Kite currently requires .metadata.
> Parquet files have their own metadata stored along data files.
> It would be great for Export operation on parquet files to RDBMS not to 
> require .metadata.
> We have most of the files created by Spark and Hive, and they don't create 
> .metadata, it only Kite that does.
> It makes sqoop export of parquet files usability very limited.
> 
> 
> Diffs
> -
> 
>   src/java/org/apache/sqoop/avro/AvroUtil.java ee29f140 
>   src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 6f9afaf9 
>   src/test/com/cloudera/sqoop/TestParquetExport.java 680fd73b 
> 
> 
> Diff: https://reviews.apache.org/r/61522/diff/2/
> 
> 
> Testing
> ---
> 
> testSupportedParquetTypesForWithoutParquetMeta - done
> testNullableFieldWithoutParquetMeta - done
> testParquetRecordsNotSupportedWithoutParquetMeta -done
> testMissingDatabaseFieldsWithoutParquetMeta - done
> testMissingParquetFieldsWithoutParquetMeta - done
> 
> 
> Thanks,
> 
> Sandish Kumar HN
> 
>



Re: Review Request 61522: SQOOP-2907 : Export parquet files to RDBMS: don't require .metadata for parquet files

2017-10-04 Thread Sandish Kumar HN

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61522/
---

(Updated Oct. 4, 2017, 7:22 a.m.)


Review request for Sqoop and Anna Szonyi.


Changes
---

fiexed changes


Bugs: SQOOP-2907
https://issues.apache.org/jira/browse/SQOOP-2907


Repository: sqoop-trunk


Description
---

Kite currently requires .metadata.
Parquet files have their own metadata stored along data files.
It would be great for Export operation on parquet files to RDBMS not to require 
.metadata.
We have most of the files created by Spark and Hive, and they don't create 
.metadata, it only Kite that does.
It makes sqoop export of parquet files usability very limited.


Diffs (updated)
-

  src/java/org/apache/sqoop/avro/AvroUtil.java ee29f140 
  src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 6f9afaf9 
  src/test/com/cloudera/sqoop/TestParquetExport.java 680fd73b 


Diff: https://reviews.apache.org/r/61522/diff/2/

Changes: https://reviews.apache.org/r/61522/diff/1-2/


Testing
---

testSupportedParquetTypesForWithoutParquetMeta - done
testNullableFieldWithoutParquetMeta - done
testParquetRecordsNotSupportedWithoutParquetMeta -done
testMissingDatabaseFieldsWithoutParquetMeta - done
testMissingParquetFieldsWithoutParquetMeta - done


Thanks,

Sandish Kumar HN



Re: Review Request 61522: SQOOP-2907 : Export parquet files to RDBMS: don't require .metadata for parquet files

2017-10-02 Thread Zoltán Tóth

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61522/#review186844
---



Hey Sandish,

Thanks for your contribution. I am pretty sure your work will solve a lot of 
headache for a lot of people.

I'd some findings so I added my comments to the code. Please read them and put 
your comments/ideas to it.

Cheers, Zoli


src/java/org/apache/sqoop/avro/AvroUtil.java
Lines 194 (patched)


If it is only Parquet file related then what is this change in AvroUtils. I 
haven't checked why it is necessary so please explain why does this change 
needed.



src/java/org/apache/sqoop/avro/AvroUtil.java
Lines 195 (patched)


If this part is needed then you can use Java Charset to load UTF-8 so it 
won't throw IOException.



src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java
Lines 100 (patched)


It is called here double time. Is it okay?



src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java
Lines 107 (patched)


Why do you use double log.warn here? You can merge it to one.



src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java
Lines 111 (patched)


Why don't use the catch block for this code snippet?



src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java
Lines 120 (patched)


I think this shouldn't be public. If you use it only inside of this class 
then make it private. If you would like to use it outside of this class then a 
getter would make more sense.

It is also should be somewhere begining of the class because it is a static 
final variable. You can intialize it in a static block.



src/test/com/cloudera/sqoop/TestParquetExport.java
Lines 165 (patched)


This method is almost copy of the previous one. Please avoid duplicates and 
put the common parts into one method.

And please also change the comment of the method becuase that is the same 
but the two method doing different thing.

fileNum parameter is not used in the method, please remove it.



src/test/com/cloudera/sqoop/TestParquetExport.java
Lines 226 (patched)


Please do not copy and paste code parts which is already in the class. It 
increases duplication which makes harder to read and maintain later.

testSupportedParquetTypes() method also contains this part of the code. Put 
them into a different method like createParquetTestFileContent()



src/test/com/cloudera/sqoop/TestParquetExport.java
Lines 232 (patched)


We don't have an official formatter yet but I think a 120 columns length 
row is better than 80.



src/test/com/cloudera/sqoop/TestParquetExport.java
Lines 475 (patched)


Another duplication



src/test/com/cloudera/sqoop/TestParquetExport.java
Lines 527 (patched)


Please specify the expected exception here. Your test should have one good 
result only. Pleas avoid unnecessary usage of general exceptions.

Use expected exception instead of try - catch. It makes the code more 
readable.



src/test/com/cloudera/sqoop/TestParquetExport.java
Lines 614 (patched)


Same here. Please specify the exception and use expected exception instead 
of try - catch blocks.


- Zoltán Tóth


On Aug. 9, 2017, 10:53 a.m., Sandish Kumar HN wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61522/
> ---
> 
> (Updated Aug. 9, 2017, 10:53 a.m.)
> 
> 
> Review request for Sqoop and Anna Szonyi.
> 
> 
> Bugs: SQOOP-2907
> https://issues.apache.org/jira/browse/SQOOP-2907
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> Kite currently requires .metadata.
> Parquet files have their own metadata stored along data files.
> It would be great for Export operation on parquet files to RDBMS not to 
> require .metadata.
> We have most of the files created by Spark and Hive, and they don't create 
> .metadata, it only Kite that does.
> It makes sqoop export of parquet files usability very limited.
> 
> 
> Diffs
> -
> 
>   src/java/org/apache/sqoop/avro/AvroUtil.java ee29f140 
>   src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 6f9afaf9 
>   src/test/com/cloudera/sqoop/TestParquetExport.java 680fd73b 
> 
> 
> Diff: 

Re: Review Request 61522: SQOOP-2907 : Export parquet files to RDBMS: don't require .metadata for parquet files

2017-08-09 Thread Sandish Kumar HN

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61522/
---

(Updated Aug. 9, 2017, 10:50 a.m.)


Review request for Sqoop, AnnaTW AnnaTW and Anna Szonyi.


Bugs: SQOOP-2907
https://issues.apache.org/jira/browse/SQOOP-2907


Repository: sqoop-trunk


Description
---

Kite currently requires .metadata.
Parquet files have their own metadata stored along data files.
It would be great for Export operation on parquet files to RDBMS not to require 
.metadata.
We have most of the files created by Spark and Hive, and they don't create 
.metadata, it only Kite that does.
It makes sqoop export of parquet files usability very limited.


Diffs
-

  src/java/org/apache/sqoop/avro/AvroUtil.java ee29f140 
  src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 6f9afaf9 
  src/test/com/cloudera/sqoop/TestParquetExport.java 680fd73b 


Diff: https://reviews.apache.org/r/61522/diff/1/


Testing (updated)
---

testSupportedParquetTypesForWithoutParquetMeta - done
testNullableFieldWithoutParquetMeta - done
testParquetRecordsNotSupportedWithoutParquetMeta -done
testMissingDatabaseFieldsWithoutParquetMeta - done
testMissingParquetFieldsWithoutParquetMeta - done


Thanks,

Sandish Kumar HN



Re: Review Request 61522: SQOOP-2907 : Export parquet files to RDBMS: don't require .metadata for parquet files

2017-08-09 Thread Sandish Kumar HN

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61522/
---

(Updated Aug. 9, 2017, 10:48 a.m.)


Review request for Sqoop, AnnaTW AnnaTW and Anna Szonyi.


Bugs: SQOOP-2907
https://issues.apache.org/jira/browse/SQOOP-2907


Repository: sqoop-trunk


Description
---

Kite currently requires .metadata.
Parquet files have their own metadata stored along data files.
It would be great for Export operation on parquet files to RDBMS not to require 
.metadata.
We have most of the files created by Spark and Hive, and they don't create 
.metadata, it only Kite that does.
It makes sqoop export of parquet files usability very limited.


Diffs
-

  src/java/org/apache/sqoop/avro/AvroUtil.java ee29f140 
  src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 6f9afaf9 
  src/test/com/cloudera/sqoop/TestParquetExport.java 680fd73b 


Diff: https://reviews.apache.org/r/61522/diff/1/


Testing
---


Thanks,

Sandish Kumar HN



Re: Review Request 61522: SQOOP-2907 : Export parquet files to RDBMS: don't require .metadata for parquet files

2017-08-09 Thread Sandish Kumar HN

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61522/#review182476
---


Ship it!




tested

- Sandish Kumar HN


On Aug. 9, 2017, 10:46 a.m., Sandish Kumar HN wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61522/
> ---
> 
> (Updated Aug. 9, 2017, 10:46 a.m.)
> 
> 
> Review request for Sqoop, AnnaTW AnnaTW and Anna Szonyi.
> 
> 
> Bugs: SQOOP-2907
> https://issues.apache.org/jira/browse/SQOOP-2907
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> ---
> 
> Kite currently requires .metadata.
> Parquet files have their own metadata stored along data files.
> It would be great for Export operation on parquet files to RDBMS not to 
> require .metadata.
> We have most of the files created by Spark and Hive, and they don't create 
> .metadata, it only Kite that does.
> It makes sqoop export of parquet files usability very limited.
> 
> 
> Diffs
> -
> 
>   src/java/org/apache/sqoop/avro/AvroUtil.java ee29f140 
>   src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 6f9afaf9 
>   src/test/com/cloudera/sqoop/TestParquetExport.java 680fd73b 
> 
> 
> Diff: https://reviews.apache.org/r/61522/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sandish Kumar HN
> 
>