Re: Can't access file in spark, but can in hadoop

2015-03-28 Thread Ted Yu
Thanks for the follow-up, Dale.

bq. hdp 2.3.1

Minor correction: should be hdp 2.1.3

Cheers

On Sat, Mar 28, 2015 at 2:28 AM, Johnson, Dale daljohn...@ebay.com wrote:

  Actually I did figure this out eventually.

  I’m running on a Hortonworks cluster hdp 2.3.1 (hadoop 2.4.1).  Spark
 bundles the org/apache/hadoop/hdfs/… classes along with the spark-assembly
 jar.  This turns out to introduce a small incompatibility with hdp 2.3.1.
 I carved these classes out of the jar, and put a distro-provided jar into
 the class path for the hdfs classes, and this fixed the problem.

  Ideally there would be an exclusion in the pom to deal with this.

  Dale.

   From: Zhan Zhang zzh...@hortonworks.com
 Date: Friday, March 27, 2015 at 4:28 PM
 To: Johnson, Dale daljohn...@ebay.com
 Cc: Ted Yu yuzhih...@gmail.com, user user@spark.apache.org

 Subject: Re: Can't access file in spark, but can in hadoop

   Probably guava version conflicts issue. What spark version did you use,
 and which hadoop version it compile against?

  Thanks.

  Zhan Zhang

  On Mar 27, 2015, at 12:13 PM, Johnson, Dale daljohn...@ebay.com wrote:

  Yes, I could recompile the hdfs client with more logging, but I don’t
 have the day or two to spare right this week.

  One more thing about this, the cluster is Horton Works 2.1.3 [.0]

  They seem to have a claim of supporting spark on Horton Works 2.2

  Dale.

   From: Ted Yu yuzhih...@gmail.com
 Date: Thursday, March 26, 2015 at 4:54 PM
 To: Johnson, Dale daljohn...@ebay.com
 Cc: user user@spark.apache.org
 Subject: Re: Can't access file in spark, but can in hadoop

   Looks like the following assertion failed:
   Preconditions.checkState(storageIDsCount == locs.size());

  locs is ListDatanodeInfoProto
 Can you enhance the assertion to log more information ?

  Cheers

 On Thu, Mar 26, 2015 at 3:06 PM, Dale Johnson daljohn...@ebay.com wrote:

 There seems to be a special kind of corrupted according to Spark state
 of
 file in HDFS.  I have isolated a set of files (maybe 1% of all files I
 need
 to work with) which are producing the following stack dump when I try to
 sc.textFile() open them.  When I try to open directories, most large
 directories contain at least one file of this type.  Curiously, the
 following two lines fail inside of a Spark job, but not inside of a Scoobi
 job:

 val conf = new org.apache.hadoop.conf.Configuration
 val fs = org.apache.hadoop.fs.FileSystem.get(conf)

 The stack trace follows:

 15/03/26 14:22:43 INFO yarn.ApplicationMaster: Final app status: FAILED,
 exitCode: 15, (reason: User class threw exception: null)
 Exception in thread Driver java.lang.IllegalStateException
 at

 org.spark-project.guava.common.base.Preconditions.checkState(Preconditions.java:133)
 at
 org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:673)
 at

 org.apache.hadoop.hdfs.protocolPB.PBHelper.convertLocatedBlock(PBHelper.java:1100)
 at
 org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1118)
 at
 org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1251)
 at
 org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1354)
 at
 org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1363)
 at

 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:518)
 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1743)
 at

 org.apache.hadoop.hdfs.DistributedFileSystem$15.init(DistributedFileSystem.java:738)
 at

 org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:727)
 at
 org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1662)
 at org.apache.hadoop.fs.FileSystem$5.init(FileSystem.java:1724)
 at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:1721)
 at

 com.ebay.ss.niffler.miner.speller.SpellQueryLaunch$$anonfun$main$2.apply(SpellQuery.scala:1125)
 at

 com.ebay.ss.niffler.miner.speller.SpellQueryLaunch$$anonfun$main$2.apply(SpellQuery.scala:1123)
 at

 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at
 scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
 at

 com.ebay.ss.niffler.miner.speller.SpellQueryLaunch$.main(SpellQuery.scala:1123)
 at
 com.ebay.ss.niffler.miner.speller.SpellQueryLaunch.main(SpellQuery.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at

 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:427)
 15/03/26 14:22:43 INFO yarn.ApplicationMaster: 

Re: Can't access file in spark, but can in hadoop

2015-03-28 Thread Johnson, Dale
Actually I did figure this out eventually.

I’m running on a Hortonworks cluster hdp 2.3.1 (hadoop 2.4.1).  Spark bundles 
the org/apache/hadoop/hdfs/… classes along with the spark-assembly jar.  This 
turns out to introduce a small incompatibility with hdp 2.3.1.  I carved these 
classes out of the jar, and put a distro-provided jar into the class path for 
the hdfs classes, and this fixed the problem.

Ideally there would be an exclusion in the pom to deal with this.

Dale.

From: Zhan Zhang zzh...@hortonworks.commailto:zzh...@hortonworks.com
Date: Friday, March 27, 2015 at 4:28 PM
To: Johnson, Dale daljohn...@ebay.commailto:daljohn...@ebay.com
Cc: Ted Yu yuzhih...@gmail.commailto:yuzhih...@gmail.com, user 
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: Can't access file in spark, but can in hadoop

Probably guava version conflicts issue. What spark version did you use, and 
which hadoop version it compile against?

Thanks.

Zhan Zhang

On Mar 27, 2015, at 12:13 PM, Johnson, Dale 
daljohn...@ebay.commailto:daljohn...@ebay.com wrote:

Yes, I could recompile the hdfs client with more logging, but I don’t have the 
day or two to spare right this week.

One more thing about this, the cluster is Horton Works 2.1.3 [.0]

They seem to have a claim of supporting spark on Horton Works 2.2

Dale.

From: Ted Yu yuzhih...@gmail.commailto:yuzhih...@gmail.com
Date: Thursday, March 26, 2015 at 4:54 PM
To: Johnson, Dale daljohn...@ebay.commailto:daljohn...@ebay.com
Cc: user user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: Can't access file in spark, but can in hadoop

Looks like the following assertion failed:
  Preconditions.checkState(storageIDsCount == locs.size());

locs is ListDatanodeInfoProto
Can you enhance the assertion to log more information ?

Cheers

On Thu, Mar 26, 2015 at 3:06 PM, Dale Johnson 
daljohn...@ebay.commailto:daljohn...@ebay.com wrote:
There seems to be a special kind of corrupted according to Spark state of
file in HDFS.  I have isolated a set of files (maybe 1% of all files I need
to work with) which are producing the following stack dump when I try to
sc.textFile() open them.  When I try to open directories, most large
directories contain at least one file of this type.  Curiously, the
following two lines fail inside of a Spark job, but not inside of a Scoobi
job:

val conf = new org.apache.hadoop.conf.Configuration
val fs = org.apache.hadoop.fs.FileSystem.get(conf)

The stack trace follows:

15/03/26 14:22:43 INFO yarn.ApplicationMaster: Final app status: FAILED,
exitCode: 15, (reason: User class threw exception: null)
Exception in thread Driver java.lang.IllegalStateException
at
org.spark-project.guava.common.base.Preconditions.checkState(Preconditions.java:133)
at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:673)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convertLocatedBlock(PBHelper.java:1100)
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1118)
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1251)
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1354)
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1363)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:518)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1743)
at
org.apache.hadoop.hdfs.DistributedFileSystem$15.init(DistributedFileSystem.java:738)
at
org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:727)
at 
org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1662)
at org.apache.hadoop.fs.FileSystem$5.init(FileSystem.java:1724)
at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:1721)
at
com.ebay.ss.niffler.miner.speller.SpellQueryLaunch$$anonfun$main$2.apply(SpellQuery.scala:1125)
at
com.ebay.ss.niffler.miner.speller.SpellQueryLaunch$$anonfun$main$2.apply(SpellQuery.scala:1123)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at
com.ebay.ss.niffler.miner.speller.SpellQueryLaunch$.main(SpellQuery.scala:1123)
at
com.ebay.ss.niffler.miner.speller.SpellQueryLaunch.main(SpellQuery.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:427)
15/03/26 14:22:43 INFO yarn.ApplicationMaster: Invoking sc stop from
shutdown hook

It 

Re: Can't access file in spark, but can in hadoop

2015-03-27 Thread Johnson, Dale
Yes, I could recompile the hdfs client with more logging, but I don’t have the 
day or two to spare right this week.

One more thing about this, the cluster is Horton Works 2.1.3 [.0]

They seem to have a claim of supporting spark on Horton Works 2.2

Dale.

From: Ted Yu yuzhih...@gmail.commailto:yuzhih...@gmail.com
Date: Thursday, March 26, 2015 at 4:54 PM
To: Johnson, Dale daljohn...@ebay.commailto:daljohn...@ebay.com
Cc: user user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: Can't access file in spark, but can in hadoop

Looks like the following assertion failed:
  Preconditions.checkState(storageIDsCount == locs.size());

locs is ListDatanodeInfoProto
Can you enhance the assertion to log more information ?

Cheers

On Thu, Mar 26, 2015 at 3:06 PM, Dale Johnson 
daljohn...@ebay.commailto:daljohn...@ebay.com wrote:
There seems to be a special kind of corrupted according to Spark state of
file in HDFS.  I have isolated a set of files (maybe 1% of all files I need
to work with) which are producing the following stack dump when I try to
sc.textFile() open them.  When I try to open directories, most large
directories contain at least one file of this type.  Curiously, the
following two lines fail inside of a Spark job, but not inside of a Scoobi
job:

val conf = new org.apache.hadoop.conf.Configuration
val fs = org.apache.hadoop.fs.FileSystem.get(conf)

The stack trace follows:

15/03/26 14:22:43 INFO yarn.ApplicationMaster: Final app status: FAILED,
exitCode: 15, (reason: User class threw exception: null)
Exception in thread Driver java.lang.IllegalStateException
at
org.spark-project.guava.common.base.Preconditions.checkState(Preconditions.java:133)
at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:673)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convertLocatedBlock(PBHelper.java:1100)
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1118)
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1251)
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1354)
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1363)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:518)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1743)
at
org.apache.hadoop.hdfs.DistributedFileSystem$15.init(DistributedFileSystem.java:738)
at
org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:727)
at 
org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1662)
at org.apache.hadoop.fs.FileSystem$5.init(FileSystem.java:1724)
at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:1721)
at
com.ebay.ss.niffler.miner.speller.SpellQueryLaunch$$anonfun$main$2.apply(SpellQuery.scala:1125)
at
com.ebay.ss.niffler.miner.speller.SpellQueryLaunch$$anonfun$main$2.apply(SpellQuery.scala:1123)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at
com.ebay.ss.niffler.miner.speller.SpellQueryLaunch$.main(SpellQuery.scala:1123)
at
com.ebay.ss.niffler.miner.speller.SpellQueryLaunch.main(SpellQuery.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:427)
15/03/26 14:22:43 INFO yarn.ApplicationMaster: Invoking sc stop from
shutdown hook

It appears to have found the three copies of the given HDFS block, but is
performing some sort of validation with them before giving them back to
spark to schedule the job.  But there is an assert failing.

I've tried this with 1.2.0, 1.2.1 and 1.3.0, and I get the exact same error,
but I've seen the line numbers change on the HDFS libraries, but not the
function names.  I've tried recompiling myself with different hadoop
versions, and it's the same.  We're running hadoop 2.4.1 on our cluster.

A google search turns up absolutely nothing on this.

Any insight at all would be appreciated.

Dale Johnson
Applied Researcher
eBay.com




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-access-file-in-spark-but-can-in-hadoop-tp22251.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org
For additional commands, e-mail: 

Re: Can't access file in spark, but can in hadoop

2015-03-27 Thread Zhan Zhang
Probably guava version conflicts issue. What spark version did you use, and 
which hadoop version it compile against?

Thanks.

Zhan Zhang

On Mar 27, 2015, at 12:13 PM, Johnson, Dale 
daljohn...@ebay.commailto:daljohn...@ebay.com wrote:

Yes, I could recompile the hdfs client with more logging, but I don’t have the 
day or two to spare right this week.

One more thing about this, the cluster is Horton Works 2.1.3 [.0]

They seem to have a claim of supporting spark on Horton Works 2.2

Dale.

From: Ted Yu yuzhih...@gmail.commailto:yuzhih...@gmail.com
Date: Thursday, March 26, 2015 at 4:54 PM
To: Johnson, Dale daljohn...@ebay.commailto:daljohn...@ebay.com
Cc: user user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: Can't access file in spark, but can in hadoop

Looks like the following assertion failed:
  Preconditions.checkState(storageIDsCount == locs.size());

locs is ListDatanodeInfoProto
Can you enhance the assertion to log more information ?

Cheers

On Thu, Mar 26, 2015 at 3:06 PM, Dale Johnson 
daljohn...@ebay.commailto:daljohn...@ebay.com wrote:
There seems to be a special kind of corrupted according to Spark state of
file in HDFS.  I have isolated a set of files (maybe 1% of all files I need
to work with) which are producing the following stack dump when I try to
sc.textFile() open them.  When I try to open directories, most large
directories contain at least one file of this type.  Curiously, the
following two lines fail inside of a Spark job, but not inside of a Scoobi
job:

val conf = new org.apache.hadoop.conf.Configuration
val fs = org.apache.hadoop.fs.FileSystem.get(conf)

The stack trace follows:

15/03/26 14:22:43 INFO yarn.ApplicationMaster: Final app status: FAILED,
exitCode: 15, (reason: User class threw exception: null)
Exception in thread Driver java.lang.IllegalStateException
at
org.spark-project.guava.common.base.Preconditions.checkState(Preconditions.java:133)
at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:673)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convertLocatedBlock(PBHelper.java:1100)
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1118)
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1251)
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1354)
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1363)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:518)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1743)
at
org.apache.hadoop.hdfs.DistributedFileSystem$15.init(DistributedFileSystem.java:738)
at
org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:727)
at 
org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1662)
at org.apache.hadoop.fs.FileSystem$5.init(FileSystem.java:1724)
at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:1721)
at
com.ebay.ss.niffler.miner.speller.SpellQueryLaunch$$anonfun$main$2.apply(SpellQuery.scala:1125)
at
com.ebay.ss.niffler.miner.speller.SpellQueryLaunch$$anonfun$main$2.apply(SpellQuery.scala:1123)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at
com.ebay.ss.niffler.miner.speller.SpellQueryLaunch$.main(SpellQuery.scala:1123)
at
com.ebay.ss.niffler.miner.speller.SpellQueryLaunch.main(SpellQuery.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:427)
15/03/26 14:22:43 INFO yarn.ApplicationMaster: Invoking sc stop from
shutdown hook

It appears to have found the three copies of the given HDFS block, but is
performing some sort of validation with them before giving them back to
spark to schedule the job.  But there is an assert failing.

I've tried this with 1.2.0, 1.2.1 and 1.3.0, and I get the exact same error,
but I've seen the line numbers change on the HDFS libraries, but not the
function names.  I've tried recompiling myself with different hadoop
versions, and it's the same.  We're running hadoop 2.4.1 on our cluster.

A google search turns up absolutely nothing on this.

Any insight at all would be appreciated.

Dale Johnson
Applied Researcher
eBay.comhttp://eBay.com




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-access-file-in-spark-but-can-in-hadoop-tp22251.html
Sent from the Apache Spark User List 

Can't access file in spark, but can in hadoop

2015-03-26 Thread Dale Johnson
There seems to be a special kind of corrupted according to Spark state of
file in HDFS.  I have isolated a set of files (maybe 1% of all files I need
to work with) which are producing the following stack dump when I try to
sc.textFile() open them.  When I try to open directories, most large
directories contain at least one file of this type.  Curiously, the
following two lines fail inside of a Spark job, but not inside of a Scoobi
job:

val conf = new org.apache.hadoop.conf.Configuration
val fs = org.apache.hadoop.fs.FileSystem.get(conf)

The stack trace follows:

15/03/26 14:22:43 INFO yarn.ApplicationMaster: Final app status: FAILED,
exitCode: 15, (reason: User class threw exception: null)
Exception in thread Driver java.lang.IllegalStateException
at
org.spark-project.guava.common.base.Preconditions.checkState(Preconditions.java:133)
at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:673)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convertLocatedBlock(PBHelper.java:1100)
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1118)
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1251)
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1354)
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1363)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:518)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1743)
at
org.apache.hadoop.hdfs.DistributedFileSystem$15.init(DistributedFileSystem.java:738)
at
org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:727)
at 
org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1662)
at org.apache.hadoop.fs.FileSystem$5.init(FileSystem.java:1724)
at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:1721)
at
com.ebay.ss.niffler.miner.speller.SpellQueryLaunch$$anonfun$main$2.apply(SpellQuery.scala:1125)
at
com.ebay.ss.niffler.miner.speller.SpellQueryLaunch$$anonfun$main$2.apply(SpellQuery.scala:1123)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at
com.ebay.ss.niffler.miner.speller.SpellQueryLaunch$.main(SpellQuery.scala:1123)
at
com.ebay.ss.niffler.miner.speller.SpellQueryLaunch.main(SpellQuery.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:427)
15/03/26 14:22:43 INFO yarn.ApplicationMaster: Invoking sc stop from
shutdown hook

It appears to have found the three copies of the given HDFS block, but is
performing some sort of validation with them before giving them back to
spark to schedule the job.  But there is an assert failing.

I've tried this with 1.2.0, 1.2.1 and 1.3.0, and I get the exact same error,
but I've seen the line numbers change on the HDFS libraries, but not the
function names.  I've tried recompiling myself with different hadoop
versions, and it's the same.  We're running hadoop 2.4.1 on our cluster.

A google search turns up absolutely nothing on this.

Any insight at all would be appreciated.

Dale Johnson
Applied Researcher
eBay.com




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-access-file-in-spark-but-can-in-hadoop-tp22251.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Can't access file in spark, but can in hadoop

2015-03-26 Thread Ted Yu
Looks like the following assertion failed:
  Preconditions.checkState(storageIDsCount == locs.size());

locs is ListDatanodeInfoProto
Can you enhance the assertion to log more information ?

Cheers

On Thu, Mar 26, 2015 at 3:06 PM, Dale Johnson daljohn...@ebay.com wrote:

 There seems to be a special kind of corrupted according to Spark state of
 file in HDFS.  I have isolated a set of files (maybe 1% of all files I need
 to work with) which are producing the following stack dump when I try to
 sc.textFile() open them.  When I try to open directories, most large
 directories contain at least one file of this type.  Curiously, the
 following two lines fail inside of a Spark job, but not inside of a Scoobi
 job:

 val conf = new org.apache.hadoop.conf.Configuration
 val fs = org.apache.hadoop.fs.FileSystem.get(conf)

 The stack trace follows:

 15/03/26 14:22:43 INFO yarn.ApplicationMaster: Final app status: FAILED,
 exitCode: 15, (reason: User class threw exception: null)
 Exception in thread Driver java.lang.IllegalStateException
 at

 org.spark-project.guava.common.base.Preconditions.checkState(Preconditions.java:133)
 at
 org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:673)
 at

 org.apache.hadoop.hdfs.protocolPB.PBHelper.convertLocatedBlock(PBHelper.java:1100)
 at
 org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1118)
 at
 org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1251)
 at
 org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1354)
 at
 org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1363)
 at

 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:518)
 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1743)
 at

 org.apache.hadoop.hdfs.DistributedFileSystem$15.init(DistributedFileSystem.java:738)
 at

 org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:727)
 at
 org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1662)
 at org.apache.hadoop.fs.FileSystem$5.init(FileSystem.java:1724)
 at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:1721)
 at

 com.ebay.ss.niffler.miner.speller.SpellQueryLaunch$$anonfun$main$2.apply(SpellQuery.scala:1125)
 at

 com.ebay.ss.niffler.miner.speller.SpellQueryLaunch$$anonfun$main$2.apply(SpellQuery.scala:1123)
 at

 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at
 scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
 at

 com.ebay.ss.niffler.miner.speller.SpellQueryLaunch$.main(SpellQuery.scala:1123)
 at
 com.ebay.ss.niffler.miner.speller.SpellQueryLaunch.main(SpellQuery.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at

 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:427)
 15/03/26 14:22:43 INFO yarn.ApplicationMaster: Invoking sc stop from
 shutdown hook

 It appears to have found the three copies of the given HDFS block, but is
 performing some sort of validation with them before giving them back to
 spark to schedule the job.  But there is an assert failing.

 I've tried this with 1.2.0, 1.2.1 and 1.3.0, and I get the exact same
 error,
 but I've seen the line numbers change on the HDFS libraries, but not the
 function names.  I've tried recompiling myself with different hadoop
 versions, and it's the same.  We're running hadoop 2.4.1 on our cluster.

 A google search turns up absolutely nothing on this.

 Any insight at all would be appreciated.

 Dale Johnson
 Applied Researcher
 eBay.com




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-access-file-in-spark-but-can-in-hadoop-tp22251.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org