[GitHub] sqoop pull request #41: Sqoop-3224: Binary ftp transfer mode

2018-01-16 Thread christeoh
Github user christeoh commented on a diff in the pull request:

https://github.com/apache/sqoop/pull/41#discussion_r161689494
  
--- Diff: src/java/org/apache/sqoop/mapreduce/RawKeyTextOutputFormat.java 
---
@@ -61,6 +62,9 @@ private void writeObject(Object o) throws IOException {
   if (o instanceof Text) {
 Text to = (Text) o;
 out.write(to.getBytes(), 0, to.getLength());
+  } else if (o instanceof BytesWritable) {
--- End diff --

Thanks @szvasas . I need a bit of help with applying the technique on the 
OutputFormats as they contain inner classes.


---


[GitHub] sqoop pull request #41: Sqoop-3224: Binary ftp transfer mode

2017-12-22 Thread szvasas
Github user szvasas commented on a diff in the pull request:

https://github.com/apache/sqoop/pull/41#discussion_r158502479
  
--- Diff: src/java/org/apache/sqoop/mapreduce/RawKeyTextOutputFormat.java 
---
@@ -61,6 +62,9 @@ private void writeObject(Object o) throws IOException {
   if (o instanceof Text) {
 Text to = (Text) o;
 out.write(to.getBytes(), 0, to.getLength());
+  } else if (o instanceof BytesWritable) {
--- End diff --

I think it would be a good idea to keep the BinaryKeyOutputFormat because 
we change the behavior of RawKeyTextOutputFormat now which may lead to issues 
in other connectors and its name suggests that it handles text and not binary. 
You can use a technique to elimiate the code duplication similar I used in 
MainframeDatasetBinaryImportMapper.


---


[GitHub] sqoop pull request #41: Sqoop-3224: Binary ftp transfer mode

2017-12-22 Thread szvasas
Github user szvasas commented on a diff in the pull request:

https://github.com/apache/sqoop/pull/41#discussion_r158502339
  
--- Diff: src/java/org/apache/sqoop/SqoopOptions.java ---
@@ -308,7 +308,9 @@ public String toString() {
   // Indicates if the data set is on tape to use different FTP parser
   @StoredAsProperty("mainframe.input.dataset.tape")
   private String mainframeInputDatasetTape;
-
+  // Indicates if binary or ascii FTP transfer mode should be used
+  @StoredAsProperty("mainframe.ftp.transfermode")
+  private String mainframeFtpTransferMode;
--- End diff --

So my idea is to add a new field to 
com.cloudera.sqoop.SqoopOptions.FileLayout, let's say BinaryFile which would 
only be supported for mainframe imports (we should add validator to check this) 
and I would use this new field instead of introducing the new 
mainframeFtpTransferMode Sqoop option. This would not make the Sqoop interface 
more complex and would let us reuse already existing concepts. What do you 
think?


---


[GitHub] sqoop pull request #41: Sqoop-3224: Binary ftp transfer mode

2017-12-22 Thread szvasas
Github user szvasas commented on a diff in the pull request:

https://github.com/apache/sqoop/pull/41#discussion_r158502914
  
--- Diff: 
src/java/org/apache/sqoop/mapreduce/mainframe/MainframeDatasetBinaryRecord.java 
---
@@ -0,0 +1,128 @@
+/**
--- End diff --

I am still not sure we will need this new class.
I understand that when a SqoopRecord is generated runtime it will always 
have a text field and not binary but that might be solved by changing 
org.apache.sqoop.manager.MainframeManager#getColumnTypes

My guess is that if we changed that to return column type based on the 
transfer mode or file type in the SqoopOptions object we could just use the 
generated class.


---


[GitHub] sqoop pull request #41: Sqoop-3224: Binary ftp transfer mode

2017-12-12 Thread christeoh
GitHub user christeoh opened a pull request:

https://github.com/apache/sqoop/pull/41

Sqoop-3224: Binary ftp transfer mode



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/christeoh/sqoop SQOOP-3224

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/sqoop/pull/41.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #41


commit 3efb449f132d366150d118282d5e1d69e3585cde
Author: Chris Teoh 
Date:   2017-10-25T10:20:07Z

Added support for binary ftp transfers

commit 045ef06ca94fe9b1f0f330c07b605f2ca23bbe18
Author: Chris Teoh 
Date:   2017-11-15T02:36:07Z

Moved mainframe FTP transfermode default setting to initDefaults()

commit 2f47f54da74a75be42ece621f2faeb7bcb640622
Author: Chris Teoh 
Date:   2017-11-15T02:38:02Z

Replaced import java.io.* with single class imports

commit e6c9dc582aa41dd88bb354dff5f807169cce8b10
Author: Chris Teoh 
Date:   2017-11-16T02:07:07Z

Removed excessive logging per record to improve performance

commit cad3f010d673ced5cbe9df3e9f6c78ea3917ac3c
Author: Chris Teoh 
Date:   2017-11-16T02:07:42Z

Added comment to document why we need to add custom class for binary 
transfers

commit f08fad3cefb219ba14317d151b18081079cdcdef
Author: Chris Teoh 
Date:   2017-11-16T03:27:48Z

Converted to use BufferedInputStream instead of InputStream

commit 2ccf3f077287011928383a9992b7472cfd4d0358
Author: Chris Teoh 
Date:   2017-11-17T00:57:48Z

Added unit tests for MainframeDatasetFTPRecordReader.getNextBinaryRecord

commit 8dd951b88f1ea3647e023b42e75fd687b14aff29
Author: Chris Teoh 
Date:   2017-11-17T01:19:04Z

Updated unit tests and used helper classes

commit ee5d4b27b804f5d9933ce859dfb2bcc289a001bd
Author: Chris Teoh 
Date:   2017-11-17T01:22:06Z

Updated unit tests to use a method of org.junit.Assert

commit 0ec15ebc35eb7839929b3c1c0063ecad9d874a00
Author: Chris Teoh 
Date:   2017-11-17T01:35:04Z

Updated unit test for compilation

commit cff44a3082a2ec52f072fd5f35d31606c551992a
Author: Chris Teoh 
Date:   2017-11-17T05:28:57Z

Used StringUtils to do comparisons and corrected bulk imports

commit 6feb6b60a54e16ddfd7d3f3457a47d10816ab81a
Author: Chris Teoh 
Date:   2017-11-28T03:51:07Z

Replaced star import with specific class import

commit a1a479146fed0f518609f1deb1f2c3e425d90ae2
Author: Chris Teoh 
Date:   2017-11-28T03:51:33Z

Updated to use current class instead of deprecated class

commit 55cdb8f54e62fa8dc24acdcbee68d1e2541d9c0c
Author: Chris Teoh 
Date:   2017-11-28T03:52:10Z

Refactored common functionality to another function

commit c6a80348d4202eb7b0a47e41c472042294550b50
Author: Chris Teoh 
Date:   2017-11-28T03:52:30Z

Adjusted comment

commit e4e923c3d011d215b64dd099ba5931b98fa96b78
Author: Chris Teoh 
Date:   2017-11-29T04:36:44Z

Moved tests from TestMainframeDatasetFTPRecordReader to separate class

commit eb598fc7919ffb493cbdf03ff23e467f84bdf617
Author: Chris Teoh 
Date:   2017-11-29T04:37:15Z

Adjusted class for unit test support:

commit d48601cd27b216826775bd4974a6e2088600c4ae
Author: Chris Teoh 
Date:   2017-11-29T04:38:43Z

Adjusted exceptions to print full stack

commit f0d440cb02b4648e55bf699170dfd27c6f6ae903
Author: Chris Teoh 
Date:   2017-11-29T04:39:05Z

Moved unit tests to another class

commit cef76eda1271b78d3a577cffdd285f93ea218b22
Author: Chris Teoh 
Date:   2017-11-29T06:17:48Z

Updated unit tests

commit 6a86f8e2f9c35e29493c744273da50d88d0ccc0d
Author: Chris Teoh 
Date:   2017-11-29T11:27:58Z

Tidied up unit tests

commit 775105ce5adb2905c48e5ada271566a7ff6c5176
Author: Chris Teoh 
Date:   2017-11-29T11:43:28Z

Updated getNextBinaryRecord logic to be simpler




---