Hi, Are you sure you are pointing to the right path and file? Because the error says Caused by: java.io.FileNotFoundException: File does not exist: hdfs://* Please make sure the csv file is there.
On Sunday, November 26, 2017, idosenesh <[email protected]> wrote: > Im trying to bulk load into Phoenix using the CsvBulkLoadTool. > Im running on amazon EMR cluster with 3 i3x2large core nodes, and default > phoenix/hbase/emr configurations. > > I've successfully ran the job 3 times (i.e. succesfully inserted about 250G > * 3 sized csv files) but the 4th run yields the following error: > 2017-11-23 21:53:07,962 FATAL [IPC Server handler 7 on 39803] > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: > attempt_1511332372804_0016_m_002760_1 - exited : > java.lang.IllegalArgumentException: Can't read partitions file > at > org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf( > TotalOrderPartitioner.java:116) > at org.apache.hadoop.util.ReflectionUtils.setConf( > ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance( > ReflectionUtils.java:136) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.< > init>(MapTask.java:711) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs( > UserGroupInformation.java:1698) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.io.FileNotFoundException: File does not exist: > hdfs://***************:8020/mnt/var/lib/hadoop/tmp/ > partitions_66f309d7-fe46-440a-99bb-fd8f3b40099e > at > org.apache.hadoop.hdfs.DistributedFileSystem$22. > doCall(DistributedFileSystem.java:1309) > at > org.apache.hadoop.hdfs.DistributedFileSystem$22. > doCall(DistributedFileSystem.java:1301) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve( > FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus( > DistributedFileSystem.java:1317) > at org.apache.hadoop.io.SequenceFile$Reader.<init>( > SequenceFile.java:1830) > at org.apache.hadoop.io.SequenceFile$Reader.<init>( > SequenceFile.java:1853) > at > org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner. > readPartitions(TotalOrderPartitioner.java:301) > at > org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf( > TotalOrderPartitioner.java:88) > > > My hdfs utilization is not high: > [hadoop@******** /]$ hdfs dfsadmin -report > Configured Capacity: 5679504728064 (5.17 TB) > Present Capacity: 5673831846248 (5.16 TB) > DFS Remaining: 5333336719720 (4.85 TB) > DFS Used: 340495126528 (317.11 GB) > DFS Used%: 6.00% > Under replicated blocks: 0 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > Missing blocks (with replication factor 1): 0 > > > > Im running the following command: > HADOOP_CLASSPATH=/usr/lib/hbase/hbase-protocol.jar:/usr/lib/hbase/conf > hadoop jar /usr/lib/phoenix/phoenix-4.11.0-HBase-1.3-client.jar > org.apache.phoenix.mapreduce.CsvBulkLoadTool -Dfs.permissions.umask-mode= > 000 > --table KEYWORDS_COMBINED_SALTED -d '|' --ignore-errors --input > s3://path/to/my/bucket/file.csv > > > The data on the last table is structurally the same as inserted before. > > Any ideas? > > > > -- > Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/ >
