Re: Cartesian issue with user defined objects

Marco Gaido Thu, 26 Feb 2015 14:26:37 -0800

Thanks,
my issue was exactly that the function to extract the class from the file used 
the same object, by only changing it. Creating a new object for each item 
solved the issue.
Thank you very much for your reply.
Best regards.


> Il giorno 26/feb/2015, alle ore 22:25, Imran Rashid <iras...@cloudera.com> ha 
> scritto:
> 
> any chance your input RDD is being read from hdfs, and you are running into 
> this issue (in the docs on SparkContext#hadoopFile):
> 
> * '''Note:''' Because Hadoop's RecordReader class re-uses the same Writable 
> object for each
> * record, directly caching the returned RDD or directly passing it to an 
> aggregation or shuffle
> * operation will create many references to the same object.
> * If you plan to directly cache, sort, or aggregate Hadoop writable objects, 
> you should first
> * copy them using a `map` function.
> 
> 
> 
> On Thu, Feb 26, 2015 at 10:38 AM, mrk91 <marcogaid...@gmail.com 
> <mailto:marcogaid...@gmail.com>> wrote:
> Hello,
> 
> I have an issue with the cartesian method. When I use it with the Java types 
> everything is ok, but when I use it with RDD made of objects defined by me it 
> has very strage behaviors which depends on whether the RDD is cached or not 
> (you can see here 
> <http://stackoverflow.com/questions/28727823/creating-a-matrix-of-neighbors-with-spark-cartesian-issue>
>  what happens).
> 
> Is this due to a bug in its implementation or are there any requirements for 
> the objects to be passed to it?
> Thanks.
> Best regards.
> Marco 
> View this message in context: Cartesian issue with user defined objects 
> <http://apache-spark-user-list.1001560.n3.nabble.com/Cartesian-issue-with-user-defined-objects-tp21826.html>
> Sent from the Apache Spark User List mailing list archive 
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>

Re: Cartesian issue with user defined objects

Reply via email to