[GitHub] sqoop pull request #41: Sqoop-3224: Binary ftp transfer mode
Github user christeoh commented on a diff in the pull request: https://github.com/apache/sqoop/pull/41#discussion_r161689494 --- Diff: src/java/org/apache/sqoop/mapreduce/RawKeyTextOutputFormat.java --- @@ -61,6 +62,9 @@ private void writeObject(Object o) throws IOException { if (o instanceof Text) { Text to = (Text) o; out.write(to.getBytes(), 0, to.getLength()); + } else if (o instanceof BytesWritable) { --- End diff -- Thanks @szvasas . I need a bit of help with applying the technique on the OutputFormats as they contain inner classes. ---
[GitHub] sqoop pull request #41: Sqoop-3224: Binary ftp transfer mode
Github user szvasas commented on a diff in the pull request: https://github.com/apache/sqoop/pull/41#discussion_r158502479 --- Diff: src/java/org/apache/sqoop/mapreduce/RawKeyTextOutputFormat.java --- @@ -61,6 +62,9 @@ private void writeObject(Object o) throws IOException { if (o instanceof Text) { Text to = (Text) o; out.write(to.getBytes(), 0, to.getLength()); + } else if (o instanceof BytesWritable) { --- End diff -- I think it would be a good idea to keep the BinaryKeyOutputFormat because we change the behavior of RawKeyTextOutputFormat now which may lead to issues in other connectors and its name suggests that it handles text and not binary. You can use a technique to elimiate the code duplication similar I used in MainframeDatasetBinaryImportMapper. ---
[GitHub] sqoop pull request #41: Sqoop-3224: Binary ftp transfer mode
Github user szvasas commented on a diff in the pull request: https://github.com/apache/sqoop/pull/41#discussion_r158502339 --- Diff: src/java/org/apache/sqoop/SqoopOptions.java --- @@ -308,7 +308,9 @@ public String toString() { // Indicates if the data set is on tape to use different FTP parser @StoredAsProperty("mainframe.input.dataset.tape") private String mainframeInputDatasetTape; - + // Indicates if binary or ascii FTP transfer mode should be used + @StoredAsProperty("mainframe.ftp.transfermode") + private String mainframeFtpTransferMode; --- End diff -- So my idea is to add a new field to com.cloudera.sqoop.SqoopOptions.FileLayout, let's say BinaryFile which would only be supported for mainframe imports (we should add validator to check this) and I would use this new field instead of introducing the new mainframeFtpTransferMode Sqoop option. This would not make the Sqoop interface more complex and would let us reuse already existing concepts. What do you think? ---
[GitHub] sqoop pull request #41: Sqoop-3224: Binary ftp transfer mode
Github user szvasas commented on a diff in the pull request: https://github.com/apache/sqoop/pull/41#discussion_r158502914 --- Diff: src/java/org/apache/sqoop/mapreduce/mainframe/MainframeDatasetBinaryRecord.java --- @@ -0,0 +1,128 @@ +/** --- End diff -- I am still not sure we will need this new class. I understand that when a SqoopRecord is generated runtime it will always have a text field and not binary but that might be solved by changing org.apache.sqoop.manager.MainframeManager#getColumnTypes My guess is that if we changed that to return column type based on the transfer mode or file type in the SqoopOptions object we could just use the generated class. ---
[GitHub] sqoop pull request #41: Sqoop-3224: Binary ftp transfer mode
GitHub user christeoh opened a pull request: https://github.com/apache/sqoop/pull/41 Sqoop-3224: Binary ftp transfer mode You can merge this pull request into a Git repository by running: $ git pull https://github.com/christeoh/sqoop SQOOP-3224 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/sqoop/pull/41.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #41 commit 3efb449f132d366150d118282d5e1d69e3585cde Author: Chris TeohDate: 2017-10-25T10:20:07Z Added support for binary ftp transfers commit 045ef06ca94fe9b1f0f330c07b605f2ca23bbe18 Author: Chris Teoh Date: 2017-11-15T02:36:07Z Moved mainframe FTP transfermode default setting to initDefaults() commit 2f47f54da74a75be42ece621f2faeb7bcb640622 Author: Chris Teoh Date: 2017-11-15T02:38:02Z Replaced import java.io.* with single class imports commit e6c9dc582aa41dd88bb354dff5f807169cce8b10 Author: Chris Teoh Date: 2017-11-16T02:07:07Z Removed excessive logging per record to improve performance commit cad3f010d673ced5cbe9df3e9f6c78ea3917ac3c Author: Chris Teoh Date: 2017-11-16T02:07:42Z Added comment to document why we need to add custom class for binary transfers commit f08fad3cefb219ba14317d151b18081079cdcdef Author: Chris Teoh Date: 2017-11-16T03:27:48Z Converted to use BufferedInputStream instead of InputStream commit 2ccf3f077287011928383a9992b7472cfd4d0358 Author: Chris Teoh Date: 2017-11-17T00:57:48Z Added unit tests for MainframeDatasetFTPRecordReader.getNextBinaryRecord commit 8dd951b88f1ea3647e023b42e75fd687b14aff29 Author: Chris Teoh Date: 2017-11-17T01:19:04Z Updated unit tests and used helper classes commit ee5d4b27b804f5d9933ce859dfb2bcc289a001bd Author: Chris Teoh Date: 2017-11-17T01:22:06Z Updated unit tests to use a method of org.junit.Assert commit 0ec15ebc35eb7839929b3c1c0063ecad9d874a00 Author: Chris Teoh Date: 2017-11-17T01:35:04Z Updated unit test for compilation commit cff44a3082a2ec52f072fd5f35d31606c551992a Author: Chris Teoh Date: 2017-11-17T05:28:57Z Used StringUtils to do comparisons and corrected bulk imports commit 6feb6b60a54e16ddfd7d3f3457a47d10816ab81a Author: Chris Teoh Date: 2017-11-28T03:51:07Z Replaced star import with specific class import commit a1a479146fed0f518609f1deb1f2c3e425d90ae2 Author: Chris Teoh Date: 2017-11-28T03:51:33Z Updated to use current class instead of deprecated class commit 55cdb8f54e62fa8dc24acdcbee68d1e2541d9c0c Author: Chris Teoh Date: 2017-11-28T03:52:10Z Refactored common functionality to another function commit c6a80348d4202eb7b0a47e41c472042294550b50 Author: Chris Teoh Date: 2017-11-28T03:52:30Z Adjusted comment commit e4e923c3d011d215b64dd099ba5931b98fa96b78 Author: Chris Teoh Date: 2017-11-29T04:36:44Z Moved tests from TestMainframeDatasetFTPRecordReader to separate class commit eb598fc7919ffb493cbdf03ff23e467f84bdf617 Author: Chris Teoh Date: 2017-11-29T04:37:15Z Adjusted class for unit test support: commit d48601cd27b216826775bd4974a6e2088600c4ae Author: Chris Teoh Date: 2017-11-29T04:38:43Z Adjusted exceptions to print full stack commit f0d440cb02b4648e55bf699170dfd27c6f6ae903 Author: Chris Teoh Date: 2017-11-29T04:39:05Z Moved unit tests to another class commit cef76eda1271b78d3a577cffdd285f93ea218b22 Author: Chris Teoh Date: 2017-11-29T06:17:48Z Updated unit tests commit 6a86f8e2f9c35e29493c744273da50d88d0ccc0d Author: Chris Teoh Date: 2017-11-29T11:27:58Z Tidied up unit tests commit 775105ce5adb2905c48e5ada271566a7ff6c5176 Author: Chris Teoh Date: 2017-11-29T11:43:28Z Updated getNextBinaryRecord logic to be simpler ---