strangely this is working only for very small dataset of rows.. for very large datasets apparently the partitioning is not working. is there a limit to the number of columns or rows when repartitioning according to multiple columns?
regards, Imran On Wed, Oct 18, 2017 at 11:00 AM, Imran Rajjad <raj...@gmail.com> wrote: > yes..I think I figured out something like below > > Serialized Java Class > ----------------- > public class MyMapPartition implements Serializable,MapPartitionsFunction{ > @Override > public Iterator call(Iterator iter) throws Exception { > ArrayList<Row> list = new ArrayList<Row>(); > // ArrayNode array = mapper.createArrayNode(); > Row row=null; > System.out.println("--------"); > while(iter.hasNext()){ > > row=(Row) iter.next(); > System.out.println(row); > list.add(row); > } > System.out.println(">>>>"); > return list.iterator(); > } > } > > Unit Test > ----------- > JavaRDD<Row> rdd = jsc.parallelize(Arrays.asList( > RowFactory.create(11L,21L,1L) > ,RowFactory.create(11L,22L,2L) > ,RowFactory.create(11L,22L,1L) > ,RowFactory.create(12L,23L,3L) > ,RowFactory.create(12L,24L,3L) > ,RowFactory.create(12L,22L,4L) > ,RowFactory.create(13L,22L,4L) > ,RowFactory.create(14L,22L,4L) > )); > StructType structType = new StructType(); > structType = structType.add("a", DataTypes.LongType, false) > .add("b", DataTypes.LongType, false) > .add("c", DataTypes.LongType, false); > ExpressionEncoder<Row> encoder = RowEncoder.apply(structType); > > > Dataset<Row> ds = spark.createDataFrame(rdd, encoder.schema()); > ds.show(); > > MyMapPartition mp = new MyMapPartition (); > //Iterator<Row> > //.repartition(new Column("a"),new Column("b")) > Dataset<Row> grouped = ds.groupBy("a", "b","c") > .count() > .repartition(new Column("a"),new Column("b")) > .mapPartitions(mp,encoder); > > grouped.count(); > > --------------- > > output > -------- > -------- > [12,23,3,1] > >>>> > -------- > [14,22,4,1] > >>>> > -------- > [12,24,3,1] > >>>> > -------- > [12,22,4,1] > >>>> > -------- > [11,22,1,1] > [11,22,2,1] > >>>> > -------- > [11,21,1,1] > >>>> > -------- > [13,22,4,1] > >>>> > > > On Wed, Oct 18, 2017 at 10:29 AM, ayan guha <guha.a...@gmail.com> wrote: > >> How or what you want to achieve? Ie are planning to do some aggregation >> on group by c1,c2? >> >> On Wed, 18 Oct 2017 at 4:13 pm, Imran Rajjad <raj...@gmail.com> wrote: >> >>> Hi, >>> >>> I have a set of rows that are a result of a >>> groupBy(col1,col2,col3).count(). >>> >>> Is it possible to map rows belong to unique combination inside an >>> iterator? >>> >>> e.g >>> >>> col1 col2 col3 >>> a 1 a1 >>> a 1 a2 >>> b 2 b1 >>> b 2 b2 >>> >>> how can I separate rows with col1 and col2 = (a,1) and (b,2)? >>> >>> regards, >>> Imran >>> >>> -- >>> I.R >>> >> -- >> Best Regards, >> Ayan Guha >> > > > > -- > I.R > -- I.R