Hi, Jarek, I believe it certainly would. Is that something I can help with?
Andre On 16 July 2013 02:55, Jarek Jarcec Cecho <[email protected]> wrote: > Hi Andre, > thank you for the blog post. Do you think that it would be helpful to put > the additional information into the Sqoop Troubleshooting guide? > > > http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_oracle_connection_reset_errors > > Jarcec > > On Mon, Jul 15, 2013 at 01:12:27PM +1000, Andre Araujo wrote: > > Thanks, David. > > > > My blog post is pending revision and should be published soon. I'll post > > the final link when it does. > > For the time being, please see below a copy of it without the formatting. > > What worked for me was a combination of two things: > > > > - passing the "-D > > mapred.child.java.opts="-Djava.security.egd=file:/dev/../dev/urandom"" > > parameter to sqoop > > - setting the java.security.egd parameter in the HADOOP_OPTS variable, > > so that it was passed to "${HADOOP_COMMON_HOME}/bin/hadoop > > org.apache.sqoop.Sqoop" > > > > Regards, > > Andre > > > > ----------------------------------- > > > > I’ve been using Sqoop to load data into HDFS from Oracle. I’m using > version > > 1.4.3 of Sqoop, running on a Linux machine and using the Oracle JDBC > driver > > with JDK 1.6. > > > > I was getting intermittent connection resets when trying to import data. > > After much troubleshooting, I eventually found the problem to be related > to > > a known issue with the JDBC driver and found a way to work around it, > which > > is described in the post > > > > > > The problem > > > > I noticed that when I was importing data at times where the machine I was > > running the sqoop client at was mostly idle, everything would run just > > fine. However, at times when others started to work on the same machine > and > > it became a bit busier, I would start to get the errors below > > intermittently: > > > > [araujo@client ~]$ sqoop import --connect jdbc:oracle:thin:user/pwd@host > /orcl > > -m 1 --query 'select 1 from dual where $CONDITIONS' --target-dir test > > 13/07/12 09:35:39 INFO manager.SqlManager: Using default fetchSize of > 1000 > > 13/07/12 09:35:39 INFO tool.CodeGenTool: Beginning code generation > > 13/07/12 09:37:53 ERROR manager.SqlManager: Error executing statement: > > java.sql.SQLRecoverableException: IO Error: Connection reset > > at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:467) > > at oracle.jdbc.driver.PhysicalConnection.(PhysicalConnection.java:546) > > ... > > Caused by: java.net.SocketException: Connection reset > > at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96) > > at java.net.SocketOutputStream.write(SocketOutputStream.java:136) > > ... 24 more > > After some troubleshooting and googling, I found that the problem seemed > to > > be related to the issue described in the following articles: > > > > > http://stackoverflow.com/questions/2327220/oracle-jdbc-intermittent-connection-issue/ > > https://forums.oracle.com/message/3701989/ > > > > Confirming the problem > > > > To ensure the problem was the same as the one described in the articles, > > and not something else intrinsic to Sqoop, I created a small Java program > > that simply connected to the database. I was able to reproduce the issue > > using it: > > > > [araujo@client TestConn]$ time java TestConn > > Exception in thread "main" java.sql.SQLRecoverableException: IO Error: > > Connection reset > > ... > > Caused by: java.net.SocketException: Connection reset > > ... > > ... 8 more > > > > real 1m20.481s > > user 0m0.491s > > sys 0m0.051s > > The workaround suggested in the articles also worked: > > > > [araujo@client TestConn]$ time java > > -Djava.security.egd=file:/dev/../dev/urandom TestConn > > Connection successful! > > > > real 0m0.419s > > user 0m0.498s > > sys 0m0.036s > > Applying the fix to Sqoop > > > > It took me a while to figure out how to use the workaround above with > > Sqoop. Many tentatives to specify the parameter in the Sqoop command > line, > > in many different forms, didn’t work as expected. > > > > The articles mention that the java.security.egd parameter can be > centrally > > set in the $JAVA_HOME/jre/lib/security/java.security file. Unfortunately, > > this didn’t work for me. Using strace, I confirmed that Sqoop was > actually > > reading the java.security file but the setting just didn’t take effect. I > > couldn’t figure out why not and eventually gave up that alternative. > > > > After a bit more of stracing and troubleshooting, though, I finally > figured > > a way out. > > > > Sqoop seems to use the JDBC driver in two different ways: > > > > First, it connects to the Oracle database directly. It does that to > gather > > more information about the tables (or query) from where the data is > > extracted and generate the map reduce job that it will run. > > Second, the map reduce job generated by Sqoop uses the JDBC driver to > > connect to the database and perform the actual data import. > > I was hitting the problem in the first case above, but I believe in both > > cases there’s a potential for the problem to occur. So, ideally, we > should > > apply the workaround to both cases. > > > > The Sqoop documentation clearly gives us an option to address the second > > case: using the following parameter to Sqoop allows us to pass Java > command > > line options to the map reduce job: > > > > sqoop import -D > > mapred.child.java.opts="-Djava.security.egd=file:/dev/../dev/urandom" ... > > Even though I couldn’t fully prove the above, since I couldn’t > consistently > > reproduce the problem for the map reduce tasks, I believe (and hope) it > > should work well. > > > > The Sqoop direct connection to Oracle > > > > The problem with the direct connection from Sqoop to Oracle, though, > wasn’t > > resolved by that option. Trying to pass the > > “-Djava.security.egd=file:/dev/../dev/urandom” option directly to Sqoop > > didn’t work either. > > > > After digging up a bit I found that the sqoop command eventually calls > > ${HADOOP_COMMON_HOME}/bin/hadoop to execute the org.apache.sqoop.Sqoop > > class. Since the hadoop executable is used, it accepts Java command line > > options through the HADOOP_OPTS environment variable. > > > > A quick test confirmed that the case was closed: > > > > [araujo@client STimport]$ export > > HADOOP_OPTS=-Djava.security.egd=file:/dev/../dev/urandom > > [araujo@client STimport]$ sqoop import -D > > mapred.child.java.opts="-Djava.security.egd=file:/dev/../dev/urandom" > > --connect jdbc:oracle:thin:user/pwd@host/orcl -m 1 --query 'select 1 > from > > dual where $CONDITIONS' --target-dir test > > 13/07/12 10:08:17 INFO manager.SqlManager: Using default fetchSize of > 1000 > > 13/07/12 10:08:17 INFO tool.CodeGenTool: Beginning code generation > > 13/07/12 10:08:18 INFO manager.OracleManager: Time zone has been set to > GMT > > 13/07/12 10:08:18 INFO manager.SqlManager: Executing SQL statement: > select > > 1 from dual where (1 = 0) > > 13/07/12 10:08:18 INFO manager.SqlManager: Executing SQL statement: > select > > 1 from dual where (1 = 0) > > 13/07/12 10:08:18 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is > > > /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/bin/../lib/hadoop-0.20-mapreduce > > 13/07/12 10:08:18 INFO orm.CompilationManager: Found hadoop core jar at: > > > /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/bin/../lib/hadoop-0.20-mapreduce/hadoop-core.jar > > Note: > > > /tmp/sqoop-araujo/compile/02ed1ccf04debf4769910b93ca67d2ba/QueryResult.java > > uses or overrides a deprecated API. > > Note: Recompile with -Xlint:deprecation for details. > > 13/07/12 10:08:19 INFO orm.CompilationManager: Writing jar file: > > > /tmp/sqoop-araujo/compile/02ed1ccf04debf4769910b93ca67d2ba/QueryResult.jar > > 13/07/12 10:08:19 INFO mapreduce.ImportJobBase: Beginning query import. > > 13/07/12 10:08:19 WARN mapred.JobClient: Use GenericOptionsParser for > > parsing the arguments. Applications should implement Tool for the same. > > 13/07/12 10:08:21 INFO mapred.JobClient: Running job: > job_201306141710_0075 > > 13/07/12 10:08:22 INFO mapred.JobClient: map 0% reduce 0% > > 13/07/12 10:08:31 INFO mapred.JobClient: map 100% reduce 0% > > 13/07/12 10:08:33 INFO mapred.JobClient: Job complete: > job_201306141710_0075 > > 13/07/12 10:08:33 INFO mapred.JobClient: Counters: 23 > > 13/07/12 10:08:33 INFO mapred.JobClient: File System Counters > > 13/07/12 10:08:33 INFO mapred.JobClient: FILE: Number of bytes read=0 > > 13/07/12 10:08:33 INFO mapred.JobClient: FILE: Number of bytes > > written=179438 > > 13/07/12 10:08:33 INFO mapred.JobClient: FILE: Number of read > > operations=0 > > 13/07/12 10:08:33 INFO mapred.JobClient: FILE: Number of large read > > operations=0 > > 13/07/12 10:08:33 INFO mapred.JobClient: FILE: Number of write > > operations=0 > > 13/07/12 10:08:33 INFO mapred.JobClient: HDFS: Number of bytes > read=87 > > 13/07/12 10:08:33 INFO mapred.JobClient: HDFS: Number of bytes > written=2 > > 13/07/12 10:08:33 INFO mapred.JobClient: HDFS: Number of read > > operations=1 > > 13/07/12 10:08:33 INFO mapred.JobClient: HDFS: Number of large read > > operations=0 > > 13/07/12 10:08:33 INFO mapred.JobClient: HDFS: Number of write > > operations=1 > > 13/07/12 10:08:33 INFO mapred.JobClient: Job Counters > > 13/07/12 10:08:33 INFO mapred.JobClient: Launched map tasks=1 > > 13/07/12 10:08:33 INFO mapred.JobClient: Total time spent by all maps > > in occupied slots (ms)=7182 > > 13/07/12 10:08:33 INFO mapred.JobClient: Total time spent by all > > reduces in occupied slots (ms)=0 > > 13/07/12 10:08:33 INFO mapred.JobClient: Total time spent by all maps > > waiting after reserving slots (ms)=0 > > 13/07/12 10:08:33 INFO mapred.JobClient: Total time spent by all > > reduces waiting after reserving slots (ms)=0 > > 13/07/12 10:08:33 INFO mapred.JobClient: Map-Reduce Framework > > 13/07/12 10:08:33 INFO mapred.JobClient: Map input records=1 > > 13/07/12 10:08:33 INFO mapred.JobClient: Map output records=1 > > 13/07/12 10:08:33 INFO mapred.JobClient: Input split bytes=87 > > 13/07/12 10:08:33 INFO mapred.JobClient: Spilled Records=0 > > 13/07/12 10:08:33 INFO mapred.JobClient: CPU time spent (ms)=940 > > 13/07/12 10:08:33 INFO mapred.JobClient: Physical memory (bytes) > > snapshot=236580864 > > 13/07/12 10:08:33 INFO mapred.JobClient: Virtual memory (bytes) > > snapshot=34998603776 > > 13/07/12 10:08:33 INFO mapred.JobClient: Total committed heap usage > > (bytes)=1013252096 > > 13/07/12 10:08:33 INFO mapreduce.ImportJobBase: Transferred 2 bytes in > > 14.4198 seconds (0.1387 bytes/sec) > > 13/07/12 10:08:33 INFO mapreduce.ImportJobBase: Retrieved 1 records. > > - See more at: > > > http://www.pythian.com/blog/?p=56405&preview=true&preview_id=56405&preview_nonce=7dfd5e1733#sthash.mUjOwAdP.dpuf > > > > > > > > > > > > On 15 July 2013 12:06, David Robson <[email protected]> > wrote: > > > > > Hi Andre,**** > > > > > > ** ** > > > > > > I sent this reply earlier – but I don’t think it went through to the > list > > > – so sorry if this ends up being a repost (I also BCCd you).**** > > > > > > ** ** > > > > > > I had a similar issue – you can see more details here: > > > https://questmos.jira.com/browse/ORAOOP-2**** > > > > > > ** ** > > > > > > Basically if you use OraOop 1.6 – it should set this for you > > > automatically. The only problem is if someone has set the parameter as > > > final on the server – then obviously the job can’t override it.**** > > > > > > ** ** > > > > > > I think the reason it’s not working for you is you need to set it as > > > follows:**** > > > > > > ** ** > > > > > > -Dmapred.child.java.opts="-Djava.security.egd=file:///dev/urandom"**** > > > > > > ** ** > > > > > > You could try that – or download OraOop from here: > > > http://www.toadworld.com/m/freeware/565.aspx**** > > > > > > ** ** > > > > > > It is Apache 2.0 licensed and the source code is available > > > https://github.com/QuestSoftwareTCD/OracleSQOOPconnector**** > > > > > > ** ** > > > > > > *David Robson***** > > > > > > Software Developer**** > > > > > > *Dell** *| R&D, Quest Software**** > > > > > > *office *+61 3 9811 8082**** > > > > > > ** ** > > > > > > Quest Software is now part of Dell**** > > > > > > ** ** > > > > > > *From:* Andre Araujo [mailto:[email protected]] > > > *Sent:* Monday, 15 July 2013 10:30 AM > > > *To:* [email protected] > > > *Subject:* Re: Intermittent problems with sqoop using Oracle JDBC > driver** > > > ** > > > > > > ** ** > > > > > > Hi, Jarek,**** > > > > > > ** ** > > > > > > I'll fix that and reply to this email when the link is working.**** > > > > > > ** ** > > > > > > Thanks for pointing that out.**** > > > > > > ** ** > > > > > > Andre**** > > > > > > ** ** > > > > > > On 13 July 2013 01:35, Jarek Jarcec Cecho <[email protected]> > wrote:**** > > > > > > Hi Andre, > > > thank you for sharing the solution with us! The link do not seem to be > > > valid, would you mind checking if it's correct? > > > > > > Jarcec**** > > > > > > > > > On Sat, Jul 13, 2013 at 01:05:19AM +1000, Andre Araujo wrote: > > > > Hi, all, > > > > > > > > I've found the answer for my own question. > > > > The solution is explained here: > > > > http://www.pythian.com/blog/?p=56405&preview=true > > > > > > > > Regards, > > > > Andre > > > > > > > > > > > > > > > > On 12 July 2013 15:57, Andre Araujo <[email protected]> wrote: > > > > > > > > > Hi, > > > > > > > > > > While using sqoop to import data from an Oracle database I was > noticing > > > > > intermittent problems with sqoop hanging. After some > troubleshooting I > > > > > narrowed the problem down to the issue described in the following > > > pages: > > > > > > > > > > > > > > > > > > > http://stackoverflow.com/questions/2327220/oracle-jdbc-intermittent-connect= > > > > > ion-issue > > > > > https://forums.oracle.com/message/3701989 > > > > > > > > > > The problem seems to be directly related to the Oracle JDBC > driver. As > > > per > > > > > the articles above, I can tell the driver to use /dev/urandom by > using > > > the > > > > > option -Djava.security.egd. I tested it with a standalone Java > program > > > > > connecting to the database using the JDBC driver and it works well. > > > > > > > > > > sqoop, though, uses JDBC in two ways. It connects directly to the > > > database > > > > > thru JDBC to gather details about the table/query and generate the > > > mapred > > > > > job. It also uses the driver indirectly in the mapred tasks. > > > > > > > > > > We can use the -D > mapred.child.java.opts=3D"-Djava.security.egd=3D..." > > > > > opti= > > > > > on > > > > > of sqoop to tell the mapred tasks to use the option. However, I > haven't > > > > > found any way to tell sqoop to use it while connecting directly to > the > > > > > database. Because of this, it hangs soon after I start it and even > > > before > > > > > it kicks off the mapred job. > > > > > > > > > > I also tried to pass the java.security.egd option directly to sqoop > > > with > > > > > the following command line but it didn't work: > > > > > > > > > > -D java.security.egd=3D/dev/urandom -D mapred.child.java.opts=3D"-D > > > > > java.security.egd=3D/dev/urandom" > > > > > > > > > > I added the option to the java.security file, but for some reason > it > > > also > > > > > didn't sort effect for sqoop. > > > > > > > > > > Any idea on a workaround/solution for this? > > > > > > > > > > Thanks, > > > > > Andre > > > > > > > > > > -- > > > > > André Araújo > > > > > Database Administrator / SDM > > > > > The Pythian Group - Australia - www.pythian.com > > > > > > > > > > Office (calls from within Australia): 1300 366 021 x1270**** > > > > > > > > Office (international): +61 2 9191 7427 > > > > > x270<%2B61%202%209191%207427%20%20x270>*OR* > +1 > > > 613 565 8696 x1270 <%2B1%20613%20565%208696%20%20%20x1270>**** > > > > > > > > Mobile: +61 410 323 559 > > > > > Fax: +61 2 9805 0544 > > > > > IM: pythianaraujo @ AIM/MSN/Y! or [email protected] @ GTalk > > > > > > > > > > > > > > > > > > > > > > -- > > > > André Araújo > > > > Database Administrator / SDM > > > > The Pythian Group - Australia - www.pythian.com > > > > > > > > Office (calls from within Australia): 1300 366 021 x1270**** > > > > > > > Office (international): +61 2 9191 7427 > > > > x270<%2B61%202%209191%207427%20%20x270>*OR* > +1 > > > 613 565 8696 x1270 <%2B1%20613%20565%208696%20%20%20x1270>**** > > > > > > > Mobile: +61 410 323 559 > > > > Fax: +61 2 9805 0544 > > > > IM: pythianaraujo @ AIM/MSN/Y! or [email protected] @ GTalk > > > > > > > > -- > > > > > > > >**** > > > > > > > -- > > > > > > > > > > > >**** > > > > > > > > > > > > **** > > > > > > ** ** > > > > > > -- **** > > > > > > André Araújo > > > Database Administrator / SDM > > > The Pythian Group - Australia - www.pythian.com**** > > > > > > Office (calls from within Australia): 1300 366 021 x1270 > > > Office (international): +61 2 9191 7427 x270 *OR* +1 613 565 8696 > x1270 > > > Mobile: +61 410 323 559 > > > Fax: +61 2 9805 0544 > > > IM: pythianaraujo @ AIM/MSN/Y! or [email protected] @ GTalk**** > > > > > > ** ** > > > > > > ** ** > > > > > > --**** > > > > > > ** ** > > > > > > > > > > > -- > > André Araújo > > Database Administrator / SDM > > The Pythian Group - Australia - www.pythian.com > > > > Office (calls from within Australia): 1300 366 021 x1270 > > Office (international): +61 2 9191 7427 x270 *OR* +1 613 565 8696 > x1270 > > Mobile: +61 410 323 559 > > Fax: +61 2 9805 0544 > > IM: pythianaraujo @ AIM/MSN/Y! or [email protected] @ GTalk > > > > -- > > > > > > -- > > > > > > > -- André Araújo Database Administrator / SDM The Pythian Group - Australia - www.pythian.com Office (calls from within Australia): 1300 366 021 x1270 Office (international): +61 2 9191 7427 x270 *OR* +1 613 565 8696 x1270 Mobile: +61 410 323 559 Fax: +61 2 9805 0544 IM: pythianaraujo @ AIM/MSN/Y! or [email protected] @ GTalk -- --
