I'm using Hadoop 1.0.4 and Spark 1.2.0.
I'm facing a strange issue. I have a requirement to read a small file from
HDFS and all it's content has to be read at one shot. So I'm using spark
context's wholeTextFiles API passing the HDFS URL for the file.
When I try this from a spark shell it's works
u have
> used for joining. So, records 1 and 4 should generate same hash value.
> 3. group by using this new id (you have already linked the records) and
> pull out required fields.
>
> Please let the group know if it works...
>
> Best
> Ayan
>
> On Sat, Oct 31, 2015 a
Hi All,
I have a hive table where data from 2 different sources (S1 and S2) get
accumulated. Sample data below -
*RECORD_ID|SOURCE_TYPE|TRN_NO|DATE1|DATE2|BRANCH|REF1|REF2|REF3|REF4|REF5|REF6|DC_FLAG|AMOUNT|CURRENCY*
*1|S1|55|19-Oct-2015|19-Oct-2015|25602|999||41106|47311|379|9|004|999|99
erify your executor/driver actually started with this option to
> rule out a config problem.
>
> On Wed, Jul 29, 2015 at 10:45 AM, Sarath Chandra
> wrote:
> > Yes.
> >
> > As mentioned in my mail at the end, I tried with both 256 and 512
> opt
ingle node mesos
cluster on my laptop having 4 CPUs and 12GB RAM.
On Wed, Jul 29, 2015 at 2:49 PM, fightf...@163.com
wrote:
> Hi, Sarath
>
> Did you try to use and increase spark.excecutor.extraJaveOptions
> -XX:PermSize= -XX:MaxPermSize=
>
>
> --------
Dear All,
I'm using -
=> Spark 1.2.0
=> Hive 0.13.1
=> Mesos 0.18.1
=> Spring
=> JDK 1.7
I've written a scala program which
=> instantiates a spark and hive context
=> parses an XML file which provides the where clauses for queries
=> generates full fledged hive queries to be run on hi
(Test.java:7)*
Regards,
Sarath.
Thanks & Regards,
*Sarath Chandra Josyam*
Sr. Technical Architect
*Algofusion Technologies India Pvt. Ltd.*
Email: sarathchandra.jos...@algofusiontech.com
Phone: +91-80-65330112/113
Mobile: +91 8762491331
On Wed, Mar 4, 2015 at 5:08 PM, Sarath Chandra <
s
Hi,
I have a cluster running on CDH5.2.1 and I have a Mesos cluster (version
0.18.1). Through a Oozie java action I'm want to submit a Spark job to
mesos cluster. Before configuring it as Oozie job I'm testing the java
action from command line and getting exception as below. While running I'm
poin
Hi All,
I have a requirement to process a set of files in parallel. So I'm
submitting spark jobs using java's ExecutorService. But when I do this way,
1 or more jobs are failing with status as "EXITED". Earlier I tried with a
standalone spark cluster setting the job scheduling to "Fair Scheduling"
Hi All,
I have a java program which submits a spark job to a standalone spark
cluster (2 nodes; 10 cores (6+4); 12GB (8+4)). This is being called by
another java program through ExecutorService and invokes it multiple times
with different set of arguments and parameters. I have set spark memory
us
Hi All,
I'm executing a simple job in spark which reads a file on HDFS, processes
the lines and saves the processed lines back to HDFS. All the 3 stages are
happening correctly and I'm able to see the processed file on the HDFS.
But on the spark UI, the worker state is shown as "killed". And I'm
Hi All,
If my RDD is having array/sequence of strings, how can I save them as a
HDFS file with each string on separate line?
For example if I write code as below, the output should get saved as hdfs
file having one string per line
...
...
var newLines = lines.map(line => myfunc(line));
newLines.s
w your
> code since it may not be doing what you think.
>
> If you instantiate an object, it happens every time your function is
> called. map() is called once per data element; mapPartitions() once
> per partition. It depends.
>
> On Wed, Sep 10, 2014 at 3:25 PM, Sarath Ch
ableManagerClass in the function and therefore on the
> worker.
>
> mapPartitions is better if this creation is expensive.
>
> On Fri, Sep 5, 2014 at 3:06 PM, Sarath Chandra
> wrote:
> > Hi,
> >
> > I'm trying to migrate a map-reduce program to work with spark
> In the first instance, you create the object on the driver and try to
> serialize and copy it to workers. In the second, you're creating
> SomeUnserializableManagerClass in the function and therefore on the
> worker.
>
> mapPartitions is better if this creation is expensive.
&
gt; You can bring those classes out of the library and Serialize it
> (implements Serializable). It is not the right way of doing it though it
> solved few of my similar problems.
>
> Thanks
> Best Regards
>
>
> On Fri, Sep 5, 2014 at 7:36 PM, Sarath Chandra <
> sa
Hi,
I'm trying to migrate a map-reduce program to work with spark. I migrated
the program from Java to Scala. The map-reduce program basically loads a
HDFS file and for each line in the file it applies several transformation
functions available in various external libraries.
When I execute this o
n Thu, Jul 17, 2014 at 1:13 PM, Sarath Chandra <
sarathchandra.jos...@algofusiontech.com> wrote:
> No Sonal, I'm not doing any explicit call to stop context.
>
> If you see my previous post to Michael, the commented portion of the code
> is my requirement. When I run this over s
gards,
> Sonal
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
>
> On Thu, Jul 17, 2014 at 12:51 PM, Sarath Chandra <
> sarathchandra.jos...@algofusiontech.com> wrote:
>
>> Hi Michael, Soumya,
&
Hi Michael, Soumya,
Can you please check and let me know what is the issue? what am I missing?
Let me know if you need any logs to analyze.
~Sarath
On Wed, Jul 16, 2014 at 8:24 PM, Sarath Chandra <
sarathchandra.jos...@algofusiontech.com> wrote:
> Hi Michael,
>
> Tried it.
ATH $CONFIG_OPTS test.Test4 spark://master:7077
"/usr/local/spark-1.0.1-bin-hadoop1"
hdfs://master:54310/user/hduser/file1.csv
hdfs://master:54310/user/hduser/file2.csv*
~Sarath
On Wed, Jul 16, 2014 at 8:14 PM, Michael Armbrust
wrote:
> What if you just run something like:
> *sc.te
2014 at 7:59 PM, Soumya Simanta
wrote:
>
>
> Can you try submitting a very simple job to the cluster.
>
> On Jul 16, 2014, at 10:25 AM, Sarath Chandra <
> sarathchandra.jos...@algofusiontech.com> wrote:
>
> Yes it is appearing on the Spark UI, and remains there wit
Sarath
On Wed, Jul 16, 2014 at 7:48 PM, Soumya Simanta
wrote:
> When you submit your job, it should appear on the Spark UI. Same with the
> REPL. Make sure you job is submitted to the cluster properly.
>
>
> On Wed, Jul 16, 2014 at 10:08 AM, Sarath Chandra <
> sarathchandra.j
anything going
wrong, all are info messages.
What else do I need check?
~Sarath
On Wed, Jul 16, 2014 at 7:23 PM, Soumya Simanta
wrote:
> Check your executor logs for the output or if your data is not big collect
> it in the driver and print it.
>
>
>
> On Jul 16, 2014, at 9:21 AM
Hi All,
I'm trying to do a simple record matching between 2 files and wrote
following code -
*import org.apache.spark.sql.SQLContext;*
*import org.apache.spark.rdd.RDD*
*object SqlTest {*
* case class Test(fld1:String, fld2:String, fld3:String, fld4:String,
fld4:String, fld5:Double, fld6:String)
26 matches
Mail list logo