[jira] [Updated] (SPARK-5068) When the path not found in the hdfs,we can't get the result

2015-04-25 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-5068:
-
Assignee: dongxu

> When the path not found in the hdfs,we can't get the result
> ---
>
> Key: SPARK-5068
> URL: https://issues.apache.org/jira/browse/SPARK-5068
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: jeanlyn
>Assignee: dongxu
> Fix For: 1.4.0
>
>
> when the partion path was found in the metastore but not found in the hdfs,it 
> will casue some problems as follow:
> {noformat}
> hive> show partitions partition_test;
> OK
> dt=1
> dt=2
> dt=3
> dt=4
> Time taken: 0.168 seconds, Fetched: 4 row(s)
> {noformat}
> {noformat}
> hive> dfs -ls /user/jeanlyn/warehouse/partition_test;
> Found 3 items
> drwxr-xr-x   - jeanlyn supergroup  0 2014-12-02 16:29 
> /user/jeanlyn/warehouse/partition_test/dt=1
> drwxr-xr-x   - jeanlyn supergroup  0 2014-12-02 16:29 
> /user/jeanlyn/warehouse/partition_test/dt=3
> drwxr-xr-x   - jeanlyn supergroup  0 2014-12-02 17:42 
> /user/jeanlyn/warehouse/partition_test/dt=4
> {noformat}
> when i run the sql 
> {noformat}
> select * from partition_test limit 10
> {noformat} in  *hive*,i got no problem,but when i run in *spark-sql* i get 
> the error as follow:
> {noformat}
> Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: 
> Input path does not exist: 
> hdfs://jeanlyn:9000/user/jeanlyn/warehouse/partition_test/dt=2
> at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)
> at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
> at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
> at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)
> at org.apache.spark.rdd.RDD.collect(RDD.scala:780)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84)
> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)
> at org.apache.spark.sql.hive.testpartition$.main(test.scala:23)
> at org.apache.spark.sql.hive.testpartition.main(test.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
> {noformat}



--
This message was sent 

[jira] [Updated] (SPARK-5068) When the path not found in the hdfs,we can't get the result

2015-01-05 Thread jeanlyn (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jeanlyn updated SPARK-5068:
---
Fix Version/s: (was: 1.2.1)

> When the path not found in the hdfs,we can't get the result
> ---
>
> Key: SPARK-5068
> URL: https://issues.apache.org/jira/browse/SPARK-5068
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: jeanlyn
>
> when the partion path was found in the metastore but not found in the hdfs,it 
> will casue some problems as follow:
> {noformat}
> hive> show partitions partition_test;
> OK
> dt=1
> dt=2
> dt=3
> dt=4
> Time taken: 0.168 seconds, Fetched: 4 row(s)
> {noformat}
> {noformat}
> hive> dfs -ls /user/jeanlyn/warehouse/partition_test;
> Found 3 items
> drwxr-xr-x   - jeanlyn supergroup  0 2014-12-02 16:29 
> /user/jeanlyn/warehouse/partition_test/dt=1
> drwxr-xr-x   - jeanlyn supergroup  0 2014-12-02 16:29 
> /user/jeanlyn/warehouse/partition_test/dt=3
> drwxr-xr-x   - jeanlyn supergroup  0 2014-12-02 17:42 
> /user/jeanlyn/warehouse/partition_test/dt=4
> {noformat}
> when i run the sql 
> {noformat}
> select * from partition_test limit 10
> {noformat} in  *hive*,i got no problem,but when i run in *spark-sql* i get 
> the error as follow:
> {noformat}
> Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: 
> Input path does not exist: 
> hdfs://jeanlyn:9000/user/jeanlyn/warehouse/partition_test/dt=2
> at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)
> at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
> at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
> at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)
> at org.apache.spark.rdd.RDD.collect(RDD.scala:780)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84)
> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)
> at org.apache.spark.sql.hive.testpartition$.main(test.scala:23)
> at org.apache.spark.sql.hive.testpartition.main(test.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-

[jira] [Updated] (SPARK-5068) When the path not found in the hdfs,we can't get the result

2015-01-04 Thread jeanlyn (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jeanlyn updated SPARK-5068:
---
Fix Version/s: 1.2.1

> When the path not found in the hdfs,we can't get the result
> ---
>
> Key: SPARK-5068
> URL: https://issues.apache.org/jira/browse/SPARK-5068
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: jeanlyn
> Fix For: 1.2.1
>
>
> when the partion path was found in the metastore but not found in the hdfs,it 
> will casue some problems as follow:
> {noformat}
> hive> show partitions partition_test;
> OK
> dt=1
> dt=2
> dt=3
> dt=4
> Time taken: 0.168 seconds, Fetched: 4 row(s)
> {noformat}
> {noformat}
> hive> dfs -ls /user/jeanlyn/warehouse/partition_test;
> Found 3 items
> drwxr-xr-x   - jeanlyn supergroup  0 2014-12-02 16:29 
> /user/jeanlyn/warehouse/partition_test/dt=1
> drwxr-xr-x   - jeanlyn supergroup  0 2014-12-02 16:29 
> /user/jeanlyn/warehouse/partition_test/dt=3
> drwxr-xr-x   - jeanlyn supergroup  0 2014-12-02 17:42 
> /user/jeanlyn/warehouse/partition_test/dt=4
> {noformat}
> when i run the sql 
> {noformat}
> select * from partition_test limit 10
> {noformat} in  *hive*,i got no problem,but when i run in *spark-sql* i get 
> the error as follow:
> {noformat}
> Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: 
> Input path does not exist: 
> hdfs://jeanlyn:9000/user/jeanlyn/warehouse/partition_test/dt=2
> at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)
> at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
> at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
> at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
> at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)
> at org.apache.spark.rdd.RDD.collect(RDD.scala:780)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84)
> at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)
> at org.apache.spark.sql.hive.testpartition$.main(test.scala:23)
> at org.apache.spark.sql.hive.testpartition.main(test.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332

[jira] [Updated] (SPARK-5068) When the path not found in the hdfs,we can't get the result

2015-01-03 Thread jeanlyn (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jeanlyn updated SPARK-5068:
---
Description: 
when the partion path was found in the metastore but not found in the hdfs,it 
will casue some problems as follow:
{noformat}
hive> show partitions partition_test;
OK
dt=1
dt=2
dt=3
dt=4
Time taken: 0.168 seconds, Fetched: 4 row(s)
{noformat}

{noformat}
hive> dfs -ls /user/jeanlyn/warehouse/partition_test;
Found 3 items
drwxr-xr-x   - jeanlyn supergroup  0 2014-12-02 16:29 
/user/jeanlyn/warehouse/partition_test/dt=1
drwxr-xr-x   - jeanlyn supergroup  0 2014-12-02 16:29 
/user/jeanlyn/warehouse/partition_test/dt=3
drwxr-xr-x   - jeanlyn supergroup  0 2014-12-02 17:42 
/user/jeanlyn/warehouse/partition_test/dt=4
{noformat}
when i run the sql 
{noformat}
select * from partition_test limit 10
{noformat} in  *hive*,i got no problem,but when i run in *spark-sql* i get the 
error as follow:

{noformat}
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: 
Input path does not exist: 
hdfs://jeanlyn:9000/user/jeanlyn/warehouse/partition_test/dt=2
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)
at org.apache.spark.rdd.RDD.collect(RDD.scala:780)
at 
org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84)
at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)
at org.apache.spark.sql.hive.testpartition$.main(test.scala:23)
at org.apache.spark.sql.hive.testpartition.main(test.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
{noformat}


  was:
when the partion path was found in the metastore but not found in the hdfs,it 
will casue some problems as follow:
```
hive> show partitions partition_test;
OK
dt=1
dt=2
dt=3
dt=4
Time taken: 0.168 seconds, Fetched: 4 row(s)
```

```
hive> dfs -ls /user/jeanlyn/warehouse/partition_test;
Found 3 items
drwxr-xr-x   - jeanlyn supergroup  0 2014-12-02 16:29 
/user/jeanlyn/warehouse/partition_test/dt=1
drwxr-xr-x   - jeanlyn supergroup  0 2014-12-02 16:29 
/user/jeanlyn/warehouse/partition_test/dt=3
drwxr-xr-x   - jeanlyn supergroup  0 2014-12-02 17:42 
/user/jeanlyn/warehouse/partition_test/dt=4
```
when

[jira] [Updated] (SPARK-5068) When the path not found in the hdfs,we can't get the result

2015-01-03 Thread jeanlyn (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jeanlyn updated SPARK-5068:
---
Description: 
when the partion path was found in the metastore but not found in the hdfs,it 
will casue some problems as follow:
```
hive> show partitions partition_test;
OK
dt=1
dt=2
dt=3
dt=4
Time taken: 0.168 seconds, Fetched: 4 row(s)
```

```
hive> dfs -ls /user/jeanlyn/warehouse/partition_test;
Found 3 items
drwxr-xr-x   - jeanlyn supergroup  0 2014-12-02 16:29 
/user/jeanlyn/warehouse/partition_test/dt=1
drwxr-xr-x   - jeanlyn supergroup  0 2014-12-02 16:29 
/user/jeanlyn/warehouse/partition_test/dt=3
drwxr-xr-x   - jeanlyn supergroup  0 2014-12-02 17:42 
/user/jeanlyn/warehouse/partition_test/dt=4
```
when i run the sq `select * from partition_test limit 10` in  **hive**,i got no 
problem,but when i run in spark-sql i get the error as follow:

```
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: 
Input path does not exist: 
hdfs://jeanlyn:9000/user/jeanlyn/warehouse/partition_test/dt=2
at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:201)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)
at org.apache.spark.rdd.RDD.collect(RDD.scala:780)
at 
org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:84)
at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:444)
at org.apache.spark.sql.hive.testpartition$.main(test.scala:23)
at org.apache.spark.sql.hive.testpartition.main(test.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
```


  was:
when the partion path was found in the metastore but not found in the hdfs,it 
will casue some problems as follow:
```
hive> show partitions partition_test;
OK
dt=1
dt=2
dt=3
dt=4
Time taken: 0.168 seconds, Fetched: 4 row(s)
```

```
hive> dfs -ls /user/jeanlyn/warehouse/partition_test;
Found 3 items
drwxr-xr-x   - jeanlyn supergroup  0 2014-12-02 16:29 
/user/jeanlyn/warehouse/partition_test/dt=1
drwxr-xr-x   - jeanlyn supergroup  0 2014-12-02 16:29 
/user/jeanlyn/warehouse/partition_test/dt=3
drwxr-xr-x   - jeanlyn supergroup  0 2014-12-02 17:42 
/user/jeanlyn/warehouse/partition_test/dt=4
```
when i run the sq `select * from partition_test limit 10`l in  **hiv