Scala functions for dataframes

2017-02-23 Thread Advait Mohan Raut
Hi Team,

I am using Scala Spark Dataframes for data operations over CSV files.

There is a common transformation code being used by multiple process flows.
Hence I wish to create a Scala functions for that [with def fn_name()].
All process flows will use the functionality implemented inside these Scala 
functions.


Typical transformations on the data are like the following:

  1.  Modify multiple columns
  2.  Changing a column conditioned on one or more columns
  3.  Date time format manipulations
  4.  Applying regex over one or more columns.

For such transformations:


  1.  What is the best way to perform these operations ?
  2.  Can we do such operations without sql queries on dataframes ?
  3.  If there is no choice other than running sql queries then what is the 
best way to write generic scala functions for that ?
  4.  Also if we have a consideration like all input dataframes have different 
schema but have the constant column names which we need to process. What should 
be the preferred choice in this case ?

Please let me know if you need more clarification on this.




Regards
Advait





The information transmitted herewith is sensitive information intended only for 
use to the individual or entity to which it is addressed. If the reader of this 
message is not the intended recipient, you are hereby notified that any review, 
retransmission, dissemination, distribution, copying or other use of, or taking 
of any action in reliance upon, this information is strictly prohibited. If you 
have received this communication in error, please contact the sender and delete 
the material from your computer.

WARNING: E-mail communications cannot be guaranteed to be timely, secure, 
error-free or virus-free. The recipient of this communication should check this 
e-mail and each attachment for the presence of viruses. The sender does not 
accept any liability for any errors or omissions in the content of this 
electronic communication which arises as a result of e-mail transmission.

Apache Spark 2.0.0 on Microsoft Windows Create Dataframe

2016-09-16 Thread Advait Mohan Raut
Hi

I am trying to run Spark 2.0.0 in the Microsoft Windows environment without 
hadoop or hive. I am running it in the local mode i.e. cmd> spark-shell and can 
run the shell. When I try to run the sample example 
here<http://spark..apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection>
 provided in the documentation, the spark execution fails to create the data 
frame at .toDF()

// Create an RDD of Person objects from a text file, convert it to a Dataframe
val peopleDF = spark.sparkContext
..textFile("examples/src/main/resources/people.txt")
..map(_.split(","))
..map(attributes => Person(attributes(0), attributes(1).trim.toInt))
..toDF()


I have also run and tried these steps separately (split and Person) but it 
fails at .toDF()

It give the sequence of exceptions like:

HiveException: java.lang.RuntimeException:
Unable to instantiate org...SessionHiveMetaStoreClient
..
..
IllegalArgumentException: java.net.URISyntaxException: Relative path in the 
absolute URI


I have also defined the case class Person(age:String, age:Int)

I had successfully run codes for spark 1.4 on Windows. Is there any guide 
available to configure spark 2.0.0 on Windows for above mentioned environment ? 
Or it does not support ? Your inputs will be appreciated.

Source: 
http://stackoverflow.com/questions/39538544/apache-spark-2-0-0-on-microsoft-windows-create-dataframe
Similar Post: 
http://stackoverflow.com/questions/39402145/spark-windows-error-when-executing-todf



Regards
Advait Mohan Raut
Essex Lake Group LLC |
Mobile: +91-99-101-700-69 |
Email: adv...@essexlg.com<mailto:adv...@essexlg.com> |
Skype: advait.raut








The information transmitted herewith is sensitive information intended only for 
use to the individual or entity to which it is addressed. If the reader of this 
message is not the intended recipient, you are hereby notified that any review, 
retransmission, dissemination, distribution, copying or other use of, or taking 
of any action in reliance upon, this information is strictly prohibited. If you 
have received this communication in error, please contact the sender and delete 
the material from your computer.

WARNING: E-mail communications cannot be guaranteed to be timely, secure, 
error-free or virus-free. The recipient of this communication should check this 
e-mail and each attachment for the presence of viruses. The sender does not 
accept any liability for any errors or omissions in the content of this 
electronic communication which arises as a result of e-mail transmission.