Re: What is the difference between forEachAsync vs forEachPartitionAsync?

2017-04-02 Thread kant kodali
wait rdd operations should infact execute in parallel right? so if I call
rdd.forEachAsync that should execute in parallel isn't it? I guess I am a
little confused what the difference really is between forEachAsync vs
forEachPartitionAsync? besides passing in Tuple vs  Iterator of Tuples to
the lambda respectively.

On Sun, Apr 2, 2017 at 8:36 PM, kant kodali  wrote:

> Hi all,
>
> What is the difference between forEachAsync vs forEachPartitionAsync? I
> couldn't find any comments from the Javadoc. If I were to guess here is
> what I would say but please correct me if I am wrong.
>
> forEachAsync just iterate through values from all partitions one by one in
> an Async Manner
>
> forEachPartitionAsync: Fan out each partition and run the lambda for each
> partition in parallel across different workers. The lambda here will
> Iterate through values from that partition one by one in Async manner
>
> Is this right? or am I completely wrong?
>
> Thanks!
>


What is the difference between forEachAsync vs forEachPartitionAsync?

2017-04-02 Thread kant kodali
Hi all,

What is the difference between forEachAsync vs forEachPartitionAsync? I
couldn't find any comments from the Javadoc. If I were to guess here is
what I would say but please correct me if I am wrong.

forEachAsync just iterate through values from all partitions one by one in
an Async Manner

forEachPartitionAsync: Fan out each partition and run the lambda for each
partition in parallel across different workers. The lambda here will
Iterate through values from that partition one by one in Async manner

Is this right? or am I completely wrong?

Thanks!


org.apache.spark.sql.AnalysisException: resolved attribute(s) code#906 missing from code#1992,

2017-04-02 Thread grjohnson35
The exception org.apache.spark.sql.AnalysisException: resolved attribute(s)
code#906 missing from code#1992, is being thrown on a dataframe.  When I
print the schema the dataframe contains the field.   Any help is much
appreciated.


val spark = SparkSession.builder()
  .master("spark://localhost:7077")
  .enableHiveSupport()
  .appName("Refresh Medical Claims")
  .config("fs.s3.awsAccessKeyId", S3_ACCESS)
  .config("fs.s3.awsSecretAccessKey", S3_SECRET)
  .config("fs.s3a.awsAccessKeyId", S3_ACCESS)
  .config("fs.s3a.awsSecretAccessKey", S3_SECRET)
  .getOrCreate()

val startTm: Long = getTimeMS()

  def updateMinRtos8Thru27(spark: SparkSession, url: String, prop:
Properties, baseDF: DataFrame,
revCdDF: DataFrame, mcdDF: DataFrame, mccDF: DataFrame): DataFrame = {

printDFSchema(mccDF, "mccDF")
printDFSchema(baseDF, "baseDF")
printDFSchema(revCdDF, "revCdDF")

baseDF.join(mccDF, mccDF("medical_claim_id") <=>
baseDF("medical_claim_id") &&
  mccDF("medical_claim_detail_id") <=>
baseDF("medical_claim_detail_id"), "left")
  .join(revCdDF, revCdDF("revenue_code_padded_str") <=> mccDF("code"),
"left").where(revCdDF("code_type").equalTo("Revenue_Center"))
  .where(revCdDF("rtos_2_code").isNotNull)
  .where(revCdDF("rtos_2_code").between(8, 27))
  .groupBy(baseDF("medical_claim_id"),
baseDF("medical_claim_detail_id"))
  .agg(min(revCdDF("rtos_2_code").alias("min_rtos_2_8_thru_27")))
 
.agg(min(revCdDF("rtos_2_hierarchy").alias("min_rtos_2_8_thru_27_hier")))
  .select(baseDF("medical_claim_id"), baseDF("medical_claim_detail_id"),
mccDF("code"), baseDF("revenue_code"), baseDF("rev_code_distinct_count"),
baseDF("rtos_1_1_count"), baseDF("rtos_1_0_count"),
baseDF("er_visit_flag"), baseDF("observation_stay_flag"))
}


mccDF displaying Schema
root
 |-- medical_claim_id: long (nullable = true)
 |-- medical_claim_detail_id: long (nullable = true)
 |-- from_date: date (nullable = true)
 |-- member_id: long (nullable = true)
 |-- member_history_id: long (nullable = true)
 |-- code: string (nullable = true)
 |-- code_type: string (nullable = true)

baseDF displaying Schema
root
 |-- medical_claim_id: long (nullable = true)
 |-- medical_claim_detail_id: long (nullable = true)
 |-- revenue_code: string (nullable = true)
 |-- rev_code_distinct_count: long (nullable = false)
 |-- rtos_1_1_count: long (nullable = false)
 |-- rtos_1_0_count: long (nullable = false)
 |-- er_visit_flag: integer (nullable = true)
 |-- observation_stay_flag: long (nullable = false)

revCdDF displaying Schema
root
 |-- revenue_code_int: integer (nullable = false)
 |-- revenue_code_padded_str: string (nullable = false)
 |-- revenue_code_desc: string (nullable = true)
 |-- rtos_1_code: integer (nullable = true)
 |-- rtos_2_code: integer (nullable = true)
 |-- rtos_2_desc: string (nullable = true)
 |-- rtos_2_hierarchy: integer (nullable = true)
 |-- rtos_3_code: integer (nullable = true)
 |-- rtos_3_desc: string (nullable = true)

Exception in thread "main" org.apache.spark.sql.AnalysisException: resolved
attribute(s) code#906 missing from
code#1992,revenue_code#1353,medical_claim_id#901L,rtos_2_desc#5,from_date#1989,rtos_1_1_count#1367L,medical_claim_detail_id#902L,medical_claim_detail_id#1988L,rtos_2_hierarchy#6,revenue_code_desc#2,observation_stay_flag#1374L,medical_claim_id#1987L,revenue_code_padded_str#1,member_history_id#1991L,er_visit_flag#1372,member_id#1990L,code_type#1993,rtos_1_code#3,rtos_2_code#4,rtos_3_code#7,rtos_3_desc#8,rev_code_distinct_count#1365L,rtos_1_0_count#1369L,revenue_code_int#0
in operator !Join LeftOuter, (revenue_code_padded_str#1 <=> code#906);;
!Join LeftOuter, (revenue_code_padded_str#1 <=> code#906)
:- Join LeftOuter, ((medical_claim_id#901L <=> medical_claim_id#901L) &&
(medical_claim_detail_id#902L <=> medical_claim_detail_id#902L))
:  :- Sort [medical_claim_id#901L ASC NULLS FIRST,
medical_claim_detail_id#902L ASC NULLS FIRST], true
:  :  +- Aggregate [medical_claim_id#901L, medical_claim_detail_id#902L,
code#906], [medical_claim_id#901L, medical_claim_detail_id#902L, code#906 AS
revenue_code#1353, count(distinct code#906) AS
rev_code_distinct_count#1365L, count(CASE WHEN (rtos_1_code#3 = 1) THEN
rtos_1_code#3 ELSE cast(null as int) END) AS rtos_1_1_count#1367L,
count(CASE WHEN (rtos_1_code#3 = 0) THEN rtos_1_code#3 ELSE cast(null as
int) END) AS rtos_1_0_count#1369L, max(CASE WHEN lpad(code#906, 4, 0) IN
(0450,0452,0456,0459) THEN 1 ELSE 0 END) AS er_visit_flag#1372,
count(distinct CASE WHEN (rtos_2_code#4 = 9) THEN 1 ELSE cast(null as
string) END) AS observation_stay_flag#1374L]
:  : +- Project [medical_claim_id#901L, medical_claim_detail_id#902L,
code#906, code_type#907, rtos_1_code#3, rtos_2_code#4, rtos_2_hierarchy#6,
line_er_visit_flag#1332, CASE WHEN (rtos_2_code#4 = 9) THEN 1 ELSE 0 END AS
line_observation_stay_flag#1342]
:  :+- Project [medical_claim_id#901L, medical_claim_detail_id#902L,
code#906, 

Re: Graph Analytics on HBase with HGraphDB and Spark GraphFrames

2017-04-02 Thread Irving Duran
Thanks for the share!


Thank You,

Irving Duran

On Sun, Apr 2, 2017 at 7:19 PM, Felix Cheung 
wrote:

> Interesting!
>
> --
> *From:* Robert Yokota 
> *Sent:* Sunday, April 2, 2017 9:40:07 AM
> *To:* user@spark.apache.org
> *Subject:* Graph Analytics on HBase with HGraphDB and Spark GraphFrames
>
> Hi,
>
> In case anyone is interested in analyzing graphs in HBase with Apache
> Spark GraphFrames, this might be helpful:
>
> https://yokota.blog/2017/04/02/graph-analytics-on-hbase-with
> -hgraphdb-and-spark-graphframes/
>


Re: Graph Analytics on HBase with HGraphDB and Spark GraphFrames

2017-04-02 Thread Felix Cheung
Interesting!


From: Robert Yokota 
Sent: Sunday, April 2, 2017 9:40:07 AM
To: user@spark.apache.org
Subject: Graph Analytics on HBase with HGraphDB and Spark GraphFrames

Hi,

In case anyone is interested in analyzing graphs in HBase with Apache Spark 
GraphFrames, this might be helpful:

https://yokota.blog/2017/04/02/graph-analytics-on-hbase-with-hgraphdb-and-spark-graphframes/


Re: Looking at EMR Logs

2017-04-02 Thread Paul Tremblay
Thanks. That seems to work great, except EMR doesn't always copy the logs
to S3. The behavior  seems inconsistent and I am debugging it now.

On Fri, Mar 31, 2017 at 7:46 AM, Vadim Semenov 
wrote:

> You can provide your own log directory, where Spark log will be saved, and
> that you could replay afterwards.
>
> Set in your job this: `spark.eventLog.dir=s3://bucket/some/directory` and
> run it.
> Note! The path `s3://bucket/some/directory` must exist before you run your
> job, it'll not be created automatically.
>
> The Spark HistoryServer on EMR won't show you anything because it's
> looking for logs in `hdfs:///var/log/spark/apps` by default.
>
> After that you can either copy the log files from s3 to the hdfs path
> above, or you can copy them locally to `/tmp/spark-events` (the default
> directory for spark logs) and run the history server like:
> ```
> cd /usr/local/src/spark-1.6.1-bin-hadoop2.6
> sbin/start-history-server.sh
> ```
> and then open http://localhost:18080
>
>
>
>
> On Thu, Mar 30, 2017 at 8:45 PM, Paul Tremblay 
> wrote:
>
>> I am looking for tips on evaluating my Spark job after it has run.
>>
>> I know that right now I can look at the history of jobs through the web
>> ui. I also know how to look at the current resources being used by a
>> similar web ui.
>>
>> However, I would like to look at the logs after the job is finished to
>> evaluate such things as how many tasks were completed, how many executors
>> were used, etc. I currently save my logs to S3.
>>
>> Thanks!
>>
>> Henry
>>
>> --
>> Paul Henry Tremblay
>> Robert Half Technology
>>
>
>


-- 
Paul Henry Tremblay
Robert Half Technology


Graph Analytics on HBase with HGraphDB and Spark GraphFrames

2017-04-02 Thread Robert Yokota
Hi,

In case anyone is interested in analyzing graphs in HBase with Apache Spark
GraphFrames, this might be helpful:

https://yokota.blog/2017/04/02/graph-analytics-on-hbase-
with-hgraphdb-and-spark-graphframes/


Re: Spark SQL 2.1 Complex SQL - Query Planning Issue

2017-04-02 Thread Sathish Kumaran Vairavelu
Please let me know if anybody has any thoughts on this issue?

On Thu, Mar 30, 2017 at 10:37 PM Sathish Kumaran Vairavelu <
vsathishkuma...@gmail.com> wrote:

> Also, is it possible to cache logical plan and parsed query so that in
> subsequent executions it can be reused. It would improve overall query
> performance particularly in streaming jobs
> On Thu, Mar 30, 2017 at 10:06 PM Sathish Kumaran Vairavelu <
> vsathishkuma...@gmail.com> wrote:
>
> Hi Ayan,
>
> I have searched Spark configuration options but couldn't find one to pin
> execution plans in memory. Can you please help?
>
>
> Thanks
>
> Sathish
>
> On Thu, Mar 30, 2017 at 9:30 PM ayan guha  wrote:
>
> I think there is an option of pinning execution plans in memory to avoid
> such scenarios
>
> On Fri, Mar 31, 2017 at 1:25 PM, Sathish Kumaran Vairavelu <
> vsathishkuma...@gmail.com> wrote:
>
> Hi Everyone,
>
> I have complex SQL with approx 2000 lines of code and works with 50+
> tables with 50+ left joins and transformations. All the tables are fully
> cached in Memory with sufficient storage memory and working memory. The
> issue is after the launch of the query for the execution; the query takes
> approximately 40 seconds to appear in the Jobs/SQL in the application UI.
>
> While the execution takes only 25 seconds; the execution is delayed by 40
> seconds by the scheduler so the total runtime of the query becomes 65
> seconds(40s + 25s). Also, there are enough cores available during this wait
> time. I couldn't figure out why DAG scheduler is delaying the execution by
> 40 seconds. Is this due to time taken for Query Parsing and Query Planning
> for the Complex SQL? If thats the case; how do we optimize this Query
> Parsing and Query Planning time in Spark? Any help would be helpful.
>
>
> Thanks
>
> Sathish
>
>
>
>
> --
> Best Regards,
> Ayan Guha
>
>


Re: Update DF record with delta data in spark

2017-04-02 Thread Jörn Franke
If you trust that your delta file is correct then this might be the way 
forward. You just have to keep in mind that sometimes you can have several 
delta files in parallel and you need to apply then in the correct order or 
otherwise a deleted row might reappear again. Things get more messy if a delta 
file cannot be loaded and new deltas arrive - you have to wait until the wrong 
delta file can be loaded before the others etc. Delta files are usually a messy 
thing that requires much more testing effort and one has to carefully thing if 
this is worth it.

> On 2. Apr 2017, at 15:57, Selvam Raman  wrote:
> 
> Hi,
> 
> Table 1:(old File)
> 
> name  number  salray
> Test1 1   1
> Test2 2   1
> 
> Table 2: (Delta File)
> 
> name number   salray
> Test1 1   4
> Test3 3   2
> 
> 
> ​i do not have date stamp field in this table. Having composite key of name 
> and number fields.
> 
> Expected Result
> 
> name  number  salray
> Test1 1   4
> Test2 2   1
> Test3 3   2
> 
> 
> Current approach:
> 
> 1) Delete row in table1 where table1.composite key = table2.composite key.
> 2) Union all table and table2 to get updated result.
> 
> 
> is this right approach?. is there any other way to achieve it?​
> 
> -- 
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"


Represent documents as a sequence of wordID & frequency and perform PCA

2017-04-02 Thread Old-School
Imagine that 4 documents exist as shown below:

D1: the cat sat on the mat
D2: the cat sat on the cat
D3: the cat sat
D4: the mat sat

where each word in the vocabulary can be translated to its wordID:

0 the
1 cat
2 sat
3 on
4 the
5 mat

Now every document, can be represented using sparse vectors as shown below:

Vectors.sparse(5, Seq((0, 2.0), (1, 1.0), (2, 1.0), (3, 1.0), (4, 1.0))),
Vectors.sparse(5, Seq((0, 2.0), (1, 2.0), (2, 1.0), (3, 1.0))),
Vectors.sparse(5, Seq((0, 1.0), (1, 1.0), (2, 1.0))),
Vectors.sparse(5, Seq((0, 1.0), (2, 1.0), (4, 1.0
and finally, principal components can be computed as follows:

val data = Array(
Vectors.sparse(5, Seq((0, 2.0), (1, 1.0), (2, 1.0), (3, 1.0), (4,
1.0))),
Vectors.sparse(5, Seq((0, 2.0), (1, 2.0), (2, 1.0), (3, 1.0))),
Vectors.sparse(5, Seq((0, 1.0), (1, 1.0), (2, 1.0))),
Vectors.sparse(5, Seq((0, 1.0), (2, 1.0), (4, 1.0

val dataRDD = sc.parallelize(data)
val mat: RowMatrix = new RowMatrix(dataRDD)
val pc: Matrix = mat.computePrincipalComponents(4)
What I want to do, is to read the following dataset and represent each
document using sparse vectors like above, in order to compute the principal
components.


In the form: docID wordID count


1 2 1
1 39 1
1 42 3
1 77 1
1 95 1
1 96 1
2 105 1
2 108 1
3 133 3

however I am not quite sure how to read and represent the dataset as sparse
vectors. Any help would be much appreciated.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Represent-documents-as-a-sequence-of-wordID-frequency-and-perform-PCA-tp28554.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Update DF record with delta data in spark

2017-04-02 Thread Selvam Raman
Hi,

Table 1:(old File)

name number  salray
Test1 1 1
Test2 2 1
Table 2: (Delta File)

namenumber  salray
Test1 1 4
Test3 3 2


​i do not have date stamp field in this table. Having composite key of name
and number fields.

Expected Result

name number  salray
Test1 1 4
Test2 2 1
Test3 3 2


Current approach:

1) Delete row in table1 where table1.composite key = table2.composite key.
2) Union all table and table2 to get updated result.


is this right approach?. is there any other way to achieve it?​

-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"


Does Apache Spark use any Dependency Injection framework?

2017-04-02 Thread kant kodali
Hi All,

I am wondering if can get SparkConf

object
through Dependency Injection? I currently use HOCON
 library to
store all key/value pairs required to construct SparkConf. The problem is
as I created multiple client jars(By client jars I mean the one we supply
for spark-submit to run our App) where each of them requiring its own
config It would be nice to have SparkConf created by the DI framework
depending on the client jar we want to run. I am assuming someone must have
done this?

Thanks!


Re: strange behavior of spark 2.1.0

2017-04-02 Thread Jiang Jacky
Thank you for replying. 
Actually there is no any message coming during the exception. And there is no 
OOME in any executor. What I am suspecting it might be caused by AWL. 

> On Apr 2, 2017, at 5:22 AM, Timur Shenkao  wrote:
> 
> Hello,
> It's difficult to tell without details.
> I believe one of the executors dies because of OOM or some Runtime Exception 
> (some unforeseen dirty data row).
> Less probable is GC stop-the-world pause when incoming message rate increases 
> drastically.
> 
> 
>> On Saturday, April 1, 2017, Jiang Jacky  wrote:
>> Hello, Guys
>> I am running the spark streaming in 2.1.0, the scala version is tried on 
>> 2.11.7 and 2.11.4. And it is consuming from JMS. Recently, I have get the 
>> following error
>> "ERROR scheduler.ReceiverTracker: Deregistered receiver for stream 0: 
>> Stopped by driver"
>> 
>> This error can be occurred randomly, it might be couple hours or couple 
>> days. besides this error, everything is perfect.
>> When the error happens, my job is stopped completely. There is no any other 
>> error can be found.
>> I am running on top of yarn, and tried to look up the error through yarn 
>> logs, container, no any further information appears there. The job is just 
>> stopped from driver gracefully. BTW I have customized receiver, I either do 
>> not think it is happened from receiver, there is no any error exception from 
>> receiver, and I can also track the stop command is sent from "onStop" 
>> function in receiver.
>> 
>> FYI, the driver is not consuming any large memory, there is no any RDD 
>> "collect" command in the driver. I have also checked container log for each 
>> executor, and cannot find any further error.
>> 
>> 
>> 
>> 
>> The following is my conf for the spark context
>> val conf = new SparkConf().setAppName(jobName).setMaster(master)
>>   .set("spark.hadoop.validateOutputSpecs", "false")
>>   .set("spark.driver.allowMultipleContexts", "true")
>>   .set("spark.streaming.receiver.maxRate", "500")
>>   .set("spark.streaming.backpressure.enabled", "true")
>>   .set("spark.streaming.stopGracefullyOnShutdown", "true")
>>   .set("spark.eventLog.enabled", "true");
>> 
>> If you have any idea or suggestion, please let me know. Appreciate on the 
>> solution.
>> 
>> Thank you so much
>> 


Re: Partitioning strategy

2017-04-02 Thread Jörn Franke
You can always repartition, but maybe for your use case different rdds with the 
same data, but different partition strategies could make sense. It may also 
make sense to choose an appropriate format on disc (orc, parquet). You have to 
choose based also on the users' non-functional requirements.

> On 2. Apr 2017, at 12:32,  
>  wrote:
> 
> Hi,
>  
> I have RDD with 4 years’ data with suppose 20 partitions. On runtime, user 
> can decide to select few months or years of RDD. That means, based upon user 
> time selection RDD is being filtered and on filtered RDD further 
> transformations and actions are performed. And, as spark says, child RDD get 
> partitions from parent RDD.
>  
> Therefore, is there any way to decide partitioning strategy after filter 
> operations?
>  
> Regards,
> Jasbir Singh
> 
> 
> This message is for the designated recipient only and may contain privileged, 
> proprietary, or otherwise confidential information. If you have received it 
> in error, please notify the sender immediately and delete the original. Any 
> other use of the e-mail by you is prohibited. Where allowed by local law, 
> electronic communications with Accenture and its affiliates, including e-mail 
> and instant messaging (including content), may be scanned by our systems for 
> the purposes of information security and assessment of internal compliance 
> with Accenture policy. 
> __
> 
> www.accenture.com


Partitioning strategy

2017-04-02 Thread jasbir.sing
Hi,

I have RDD with 4 years’ data with suppose 20 partitions. On runtime, user can 
decide to select few months or years of RDD. That means, based upon user time 
selection RDD is being filtered and on filtered RDD further transformations and 
actions are performed. And, as spark says, child RDD get partitions from parent 
RDD.

Therefore, is there any way to decide partitioning strategy after filter 
operations?

Regards,
Jasbir Singh



This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Where allowed by local law, electronic 
communications with Accenture and its affiliates, including e-mail and instant 
messaging (including content), may be scanned by our systems for the purposes 
of information security and assessment of internal compliance with Accenture 
policy.
__

www.accenture.com


Re: strange behavior of spark 2.1.0

2017-04-02 Thread Timur Shenkao
Hello,
It's difficult to tell without details.
I believe one of the executors dies because of OOM or some Runtime
Exception (some unforeseen dirty data row).
Less probable is GC stop-the-world pause when incoming message rate
increases drastically.


On Saturday, April 1, 2017, Jiang Jacky  wrote:

> Hello, Guys
> I am running the spark streaming in 2.1.0, the scala version is tried on
> 2.11.7 and 2.11.4. And it is consuming from JMS. Recently, I have get the
> following error
> *"ERROR scheduler.ReceiverTracker: Deregistered receiver for stream 0:
> Stopped by driver"*
>
> *This error can be occurred randomly, it might be couple hours or couple
> days. besides this error, everything is perfect.*
> When the error happens, my job is stopped completely. There is no any
> other error can be found.
> I am running on top of yarn, and tried to look up the error through yarn
> logs, container, no any further information appears there. The job is just
> stopped from driver gracefully. BTW I have customized receiver, I either do
> not think it is happened from receiver, there is no any error exception
> from receiver, and I can also track the stop command is sent from "onStop"
> function in receiver.
>
> FYI, the driver is not consuming any large memory, there is no any RDD
> "collect" command in the driver. I have also checked container log for each
> executor, and cannot find any further error.
>
>
>
>
> The following is my conf for the spark context
> val conf = new SparkConf().setAppName(jobName).setMaster(master)
>   .set("spark.hadoop.validateOutputSpecs", "false")
>   .set("spark.driver.allowMultipleContexts", "true")
>   .set("spark.streaming.receiver.maxRate", "500")
>   .set("spark.streaming.backpressure.enabled", "true")
>   .set("spark.streaming.stopGracefullyOnShutdown", "true")
>   .set("spark.eventLog.enabled", "true");
>
> If you have any idea or suggestion, please let me know. Appreciate on the
> solution.
>
> Thank you so much
>
>


read binary file in PySpark

2017-04-02 Thread Yogesh Vyas
Hi,

I am trying to read binary file in PySpark using API binaryRecords(path,
recordLength), but it is giving all values as ['\x00', '\x00', '\x00',
'\x00',].

But when I am trying to read the same file using binaryFiles(0, it is
giving me correct rdd, but in form of key-value pair. The value is a string.

I wanted to get the byte array out of binary file. How to get it.??

Regards,
Yogesh